pyspark.sql.Column.isNull()
function is used to check if the current expression is NULL/None or column contains a NULL/None value, if it contains it returns a boolean value True. Similarly, isNotNull
() function is used to check if the current expression is NOT NULL or column contains a NOT NULL value. if it contains any value it returns True.
While working in PySpark DataFrame we are often required to check if the condition expression result is NULL or NOT NULL and these functions come in handy. This article will also help you understand the difference between PySpark isNull()
vs isNotNull()
First, let’s create a DataFrame from list.
# Import
from pyspark.sql import SparkSession
# Create SparkSession
spark = SparkSession.builder.master("local[1]") \
.appName("SparkByExamples.com").getOrCreate()
# Create DataFrame
data = [
("James",None,"M"),
("Anna","NY","F"),
("Julia",None,None)
]
columns = ["name","state","gender"]
df = spark.createDataFrame(data,columns)
df.show()
Yields below output. Notice that None
in the above example is represented as null
on the DataFrame result.
1. PySpark isNull()
PySpark isNull()
method return True if the current expression is NULL/None. isNull() function is present in Column class and isnull()
(n being small) is present in PySpark SQL Functions.
pyspark.sql.Column.isNull()
pyspark.sql.functions.isnull()
1.1. Syntax of isNull()
The following is the syntax of isNull()
# Syntax of isNull()
Column.isNull()
# Syntax of isnull()
pyspark.sql.functions.isnull(col)
1.2. PySpark Column.isNull() Usage with Examples
To select rows that have a null value on a selected column use filter() with isNULL()
of PySpark Column class.
These statements find all the rows in a table where the “state
” column is empty, and then they create a new table with just those rows.
# Using Column.isNull()
from pyspark.sql.functions import col
df.filter(df.state.isNull()).show()
df.filter("state is NULL").show()
df.filter(col("state").isNull()).show()
Yields below output
1.3 Using Multiple Conditions
To filter rows with NULL values on multiple columns, use either AND or & operator.
df.filter("state IS NULL AND gender IS NULL").show()
df.filter(df.state.isNull() & df.gender.isNull()).show()
Yields below output.
1.4 PySpark SQL Function isnull()
pyspark.sql.functions.isnull()
is another function that can be used to check if the column value is null. In order to use this function first you need to import it by using from pyspark.sql.functions import isnull
# functions.isnull()
from pyspark.sql.functions import isnull
df.select(isnull(df.state)).show()
2. PySpark isNotNull()
pyspark.sql.Column.isNotNull
– PySpark isNotNull()
method returns True if the current expression is NOT NULL/None. This function is only present in the Column class and there is no equivalent in sql.function
.
2.1 Syntax of isNotNull()
The following is the syntax of Column.isNotNull()
# Suntax of isNotNull()
Column.isNotNull()
2.2 PySpark isNotNull() Usage with Example
The below example uses PySpark isNotNull()
function from Column class to check if a column has a NOT NULL value.
# isNotNull() Examples
from pyspark.sql.functions import col
df.filter(df.state.isNotNull()).show()
df.filter(col("state").isNotNull()).show()
df.filter("state IS NOT NULL").show()
df.filter("NOT state IS NULL").show()
Yields below output.
3. PySpark SQL Query
When you use PySpark SQL I don’t think you can use isNull() vs isNotNull() functions however there are other ways to check if the column has NULL or NOT NULL.
df.createOrReplaceTempView("DATA")
spark.sql("SELECT * FROM DATA where STATE IS NULL").show()
spark.sql("SELECT * FROM DATA where STATE IS NULL AND GENDER IS NULL").show()
spark.sql("SELECT * FROM DATA where STATE IS NOT NULL").show()
Complete Example
Following is complete example of using PySpark isNull() vs isNotNull() functions.
#Import
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[1]") \
.appName("SparkByExamples.com").getOrCreate()
# Create DataFrame
data = [
("James",None,"M"),
("Anna","NY","F"),
("Julia",None,None)
]
columns = ["name","state","gender"]
df = spark.createDataFrame(data,columns)
df.show()
# Using isNull()
df.filter("state is NULL").show()
df.filter(df.state.isNull()).show()
from pyspark.sql.functions import col
df.filter(col("state").isNull()).show()
df.filter("state IS NULL AND gender IS NULL").show()
df.filter(df.state.isNull() & df.gender.isNull()).show()
from pyspark.sql.functions import isnull
df.select(isnull(df.state)).show()
# Using isNotNull()
from pyspark.sql.functions import col
df.filter("state IS NOT NULL").show()
df.filter("NOT state IS NULL").show()
df.filter(df.state.isNotNull()).show()
df.filter(col("state").isNotNull()).show()
df.na.drop(subset=["state"]).show()
# Using pySpark SQL
df.createOrReplaceTempView("DATA")
spark.sql("SELECT * FROM DATA where STATE IS NULL").show()
spark.sql("SELECT * FROM DATA where STATE IS NULL AND GENDER IS NULL").show()
spark.sql("SELECT * FROM DATA where STATE IS NOT NULL").show()
Conclusion
In this PySpark article, you have learned how to check if a column has value or not by using isNull() vs isNotNull() functions and also learned using pyspark.sql.functions.isnull().
Related Articles
- PySpark Count of Non null, nan Values in DataFrame
- PySpark – Find Count of null, None, NaN Values
- PySpark fillna() & fill() – Replace NULL/None Values
- PySpark How to Filter Rows with NULL Values
- PySpark Drop Rows with NULL or None Values
- PySpark Groupby Count Distinct