Both PySpark & Spark AND, OR and NOT operators are part of logical operations that supports determining the conditional-based logic relation among the operands.
In Spark/PySpark SQL expression, you need to use the following operators for AND & OR.
AND
– Evaluates to TRUE if all the conditions separated by&&
operator is TRUE.OR
– Evaluates to TRUE if any of the conditions separated by||
is TRUE.
1. Logical Operations
Both PySpark & Spark supports standard logical operators such as AND
, OR
and NOT
. These operators take Boolean expressions as arguments and return a Boolean
value.
Below we can take a look at the behavior of the Spark AND & OR operator based on the Boolean expression.
LEFT OPERAND | RIGHT OPERAND | AND | OR |
TRUE | TRUE | TRUE | TRUE |
TRUE | FALSE | FALSE | TRUE |
FALSE | TRUE | FALSE | TRUE |
FALSE | FALSE | FALSE | FALSE |
2. Null handling in Logical AND & OR Operations
The below table illustrates the behavior of Spark logical AND & OR operators when a NULL value is encountered.
LEFT Operand | RIGHT Operand | OR | AND |
TRUE | NULL | TRUE | NULL |
NULL | TRUE | TRUE | NULL |
FALSE | NULL | NULL | FALSE |
NULL | FALSE | NULL | FALSE |
NULL | NULL | NULL | NULL |
In the case of OR operations when a null value is one of the boolean expressions then
- If any one of the expressions is TRUE and the Other is NULL then the result is TRUE
- If any one of the expressions is FALSE and the Other is NULL then the result is NULL
In the case of AND operations when a null value is one of the boolean expressions then
- If any one of the expressions is TRUE and the Other is NULL then the result is NULL
- If any one of the expressions is FALSE and the Other is NULL then the result is FALSE
When both the expressions of an AND, and OR are NULL the final result is obviously NULL.
3. Spark Using And & OR Operator
Usually, AND (&&) operator is useful when you wanted to filter the Spark DataFrame by multiple conditions.
Similarly, let’s use the OR (||) operator to filter the Spark DataFrame.
// Spark AND Operator
df.filter(df("state") === "OH" && df("gender") === "M")
.show(false)
// Spark OR Operator
df.filter(df("state") === "OH" || df("gender") === "M")
.show(false)
4. PySpark Using OR Operator
PySpark Logical operations use the bitwise operators:
&
forand
|
foror
~
fornot
// PySpark AND Operator
df.filter(df("state") === "OH" & df("gender") === "M")
.show(false)
// PySpark OR Operator
df.filter(df("state") === "OH" | df("gender") === "M")
.show(false)
3. Conclusion
Spark & PySpark support standard logical operators such as AND
, OR
and NOT
. These operators take Boolean
expressions as arguments and return a Boolean
value.
Related Articles
- Spark SQL Explained with Examples
- Spark Internal Execution plan
- Spark DataFrame Union and Union All
- Spark groupByKey()
- Spark Join Multiple DataFrames | Tables
- Spark Read and Write JSON file into DataFrame
- Spark Filter Using contains() Examples
- Null values in concat() of Spark