Spark AND | OR | NOT Operators

Both PySpark & Spark AND, OR and NOT operators are part of logical operations that supports determining the conditional-based logic relation among the operands.

In Spark/PySpark SQL expression, you need to use the following operators for AND & OR.

  • AND – Evaluates to TRUE if all the conditions separated by && operator is TRUE.
  • OR – Evaluates to TRUE if any of the conditions separated by || is TRUE.

1. Logical Operations

Both PySpark & Spark supports standard logical operators such as ANDOR and NOT. These operators take Boolean expressions as arguments and return a Boolean value.

Below we can take a look at the behavior of the Spark AND & OR operator based on the Boolean expression.

LEFT OPERANDRIGHT OPERANDAND OR
TRUETRUETRUETRUE
TRUEFALSEFALSETRUE
FALSETRUEFALSETRUE
FALSEFALSEFALSEFALSE
Spark AND & OR operator

2. Null handling in Logical AND & OR Operations

The below table illustrates the behavior of Spark logical AND & OR operators when a NULL value is encountered.

LEFT OperandRIGHT OperandORAND
TRUENULLTRUENULL
NULLTRUETRUENULL
FALSENULLNULLFALSE
NULLFALSENULLFALSE
NULLNULLNULLNULL
Spark AND & OR Operator

In the case of OR operations when a null value is one of the boolean expressions then

  • If any one of the expressions is TRUE and the Other is NULL then the result is TRUE
  • If any one of the expressions is FALSE and the Other is NULL then the result is NULL

In the case of AND operations when a null value is one of the boolean expressions then

  • If any one of the expressions is TRUE and the Other is NULL then the result is NULL
  • If any one of the expressions is FALSE and the Other is NULL then the result is FALSE

When both the expressions of an AND, and OR are NULL the final result is obviously NULL.

3. Spark Using And & OR Operator

Usually, AND (&&) operator is useful when you wanted to filter the Spark DataFrame by multiple conditions.

Similarly, let’s use the OR (||) operator to filter the Spark DataFrame.


//Spark AND Operator 
df.filter(df("state") === "OH" && df("gender") === "M")
    .show(false)

//Spark OR Operator
df.filter(df("state") === "OH" || df("gender") === "M")
    .show(false)

4. PySpark Using OR Operator

PySpark Logical operations use the bitwise operators:

  • & for and
  • | for or
  • ~ for not

#PySpark AND Operator 
df.filter(df("state") === "OH" & df("gender") === "M")
    .show(false)

#PySpark OR Operator
df.filter(df("state") === "OH" | df("gender") === "M")
    .show(false)

3. Conclusion

Spark & PySpark support standard logical operators such as ANDOR and NOT. These operators take Boolean expressions as arguments and return a Boolean value.

rimmalapudi

Data Engineer. I write about BigData Architecture, tools and techniques that are used to build Bigdata pipelines and other generic blogs.

Leave a Reply