• Post author:
  • Post category:PySpark
  • Post last modified:March 27, 2024
  • Reading time:6 mins read
You are currently viewing PySpark Replace Empty Value With None/null on DataFrame

In PySpark DataFrame use when().otherwise() SQL functions to find out if a column has an empty value and use withColumn() transformation to replace a value of an existing column. In this article, I will explain how to replace an empty value with None/null on a single column, all columns selected a list of columns of DataFrame with Python examples.

Note: In PySpark DataFrame None value are shown as null value.

Related: How to get Count of NULL, Empty String Values in PySpark DataFrame

Let’s create a PySpark DataFrame with empty values on some rows.


from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()
data = [("","CA"), ("Julia",""),("Robert",""),("","NJ")]
df =spark.createDataFrame(data,["name","state"])
df.show()

#+------+-----+
#|  name|state|
#+------+-----+
#|      |   CA|
#| Julia|     |
#|Robert|     |
#|      |   NJ|
#+------+-----+

PySpark Replace Empty Value with None

In order to replace empty value with None/null on single DataFrame column, you can use withColumn() and when().otherwise() function.


#Replace empty string with None value
from pyspark.sql.functions import col,when
df.withColumn("name", \
       when(col("name")=="" ,None) \
          .otherwise(col("name"))) \
  .show()

#+------+-----+
#|  name|state|
#+------+-----+
#|  null|   CA|
#| Julia|     |
#|Robert|     |
#|  null|   NJ|
#+------+-----+

Replace Empty Value with None on All DataFrame Columns

To replace an empty value with None/null on all DataFrame columns, use df.columns to get all DataFrame columns, loop through this by applying conditions.


#Replace empty string with None for all columns
from pyspark.sql.functions import col,when
df2=df.select([when(col(c)=="",None).otherwise(col(c)).alias(c) for c in df.columns])
df2.show()

#+------+-----+
#|  name|state|
#+------+-----+
#|  null|   CA|
#| Julia| null|
#|Robert| null|
#|  null|   NJ|
#+------+-----+

Replace Empty Value with None on Selected Columns

Similarly, you can also replace a selected list of columns, specify all columns you wanted to replace in a list and use this on same expression above.


#Replace empty string with None on selected columns
from pyspark.sql.functions import col,when
replaceCols=["name","state"]
df2=df.select([when(col(c)=="",None).otherwise(col(c)).alias(c) for c in replaceCols])
df2.show()

Complete Example

Following is a complete example of replace empty value with None.


from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()
data = [("","CA"), ("Julia",""),("Robert",""),("","NJ")]
df =spark.createDataFrame(data,["name","state"])
df.show()

#Replace empty string with None
from pyspark.sql.functions import col,when
df.withColumn("name", \
       when(col("name")=="" ,None) \
          .otherwise(col("name"))) \
  .show()
  
#Replace empty string with None for all columns
df2=df.select([when(col(c)=="",None).otherwise(col(c)).alias(c) for c in df.columns])
df2.show()

#Replace empty string with None on selected columns
replaceCols=["name","state"]
df2=df.select([when(col(c)=="",None).otherwise(col(c)).alias(c) for c in replaceCols])
df2.show()

Conclusion

In summary, you have learned how to replace empty string values with None/null on single, all, and selected PySpark DataFrame columns using Python example.

Reference

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium