You are currently viewing Spark Replace Empty Value With NULL on DataFrame

In order to replace empty string value with NULL on Spark DataFrame use when().otherwise() SQL functions. In this article, I will explain how to replace an empty value with null on a single column, all columns, selected list of columns of DataFrame with Scala examples.

Related: How to get Count of NULL, Empty String Values in Spark DataFrame

Let’s create a DataFrame with empty values on some rows.


val data = Seq(("","CA"), ("Julia",""),("Robert",""),("","NJ"))
import spark.sqlContext.implicits._
val df = data.toDF("name","state")
df.show()

//+------+-----+
//|  name|state|
//+------+-----+
//|      |   CA|
//| Julia|     |
//|Robert|     |
//|      |   NJ|
//+------+-----+

Spark Replace Empty Value with NULL

In order to replace empty value with null on single DataFrame column, you can use withColumn() and when().otherwise() function.


//Replace empty string with null value
import org.apache.spark.sql.functions.{col,when}
df.withColumn("name",
       when(col("name")==="" ,null)
          .otherwise(col("name")))
  .show()

//+------+-----+
//|  name|state|
//+------+-----+
//|  null|   CA|
//| Julia|     |
//|Robert|     |
//|  null|   NJ|
//+------+-----+

Replace Empty Value with NULL on All DataFrame Columns

To replace an empty value with null on all DataFrame columns, use df.columns to get all DataFrame columns as Array[String], loop through this by applying conditions and create an Array[Column].

(colon underscore star) :_* is a Scala operator which “unpacked” as a Array[Column]*.


//Replace empty string with null for all columns
def replaceEmptyCols(columns:Array[String]):Array[Column]={
    columns.map(c=>{
      when(col(c)==="" ,null).otherwise(col(c)).alias(c)
    })
}
df.select(replaceEmptyCols(df.columns):_*).show()

//+------+-----+
//|  name|state|
//+------+-----+
//|  null|   CA|
//| Julia| null|
//|Robert| null|
//|  null|   NJ|
//+------+-----+

Replace Empty Value with NULL on Selected Columns

Similarly, you can also replace a selected list of columns, specify all columns you wanted to replace in a list and pass that list to replaceEmptyCols() funciton.


//Replace empty string with null on selected columns
val selCols=List("name","state")
df.select(replaceEmptyCols(selCols.toArray):_*).show()

Complete Example

Following is a complete example of replace empty value with null.


import org.apache.spark.sql.{Column, SparkSession}

object SparkReplaceEmptyNull extends App{

  val spark: SparkSession = SparkSession.builder()
    .master("local[1]")
    .appName("SparkByExamples.com")
    .getOrCreate()

  val data = Seq(("","CA"), ("Julia",""),("Robert",""),("","NJ"))
  import spark.sqlContext.implicits._
  val df = data.toDF("name","state")
  df.show()

  //Replace empty string with null value
  import org.apache.spark.sql.functions.{col,when}
  df.withColumn("name",
       when(col("name")==="" ,null)
          .otherwise(col("name")))
    .show()

  //Replace empty string with null for all columns
  def replaceEmptyCols(columns:Array[String]):Array[Column]={
    columns.map(c=>{
      when(col(c)==="" ,null).otherwise(col(c)).alias(c)
    })
  }
  df.select(replaceEmptyCols(df.columns):_*).show()

  //Replace empty string with null on selected columns
  val selCols=List("name","state")
  df.select(replaceEmptyCols(selCols.toArray):_*).show()
}

Conclusion

In summary, you have learned how to replace empty string value with NULL on single, all and selected DataFrame columns using Scala example.

Reference

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium