You are currently viewing Spark – How to update the DataFrame column?

In Spark, updating the DataFrame can be done by using withColumn() transformation function, In this article, I will explain how to update or change the DataFrame column.

I will also explain how to update the column based on condition.

First, let’s create a DataFrame


val data = Seq(Row(Row("James ","","Smith"),"36636","M","3000"),
      Row(Row("Michael ","Rose",""),"40288","M","4000"),
      Row(Row("Robert ","","Williams"),"42114","M","4000"),
      Row(Row("Maria ","Anne","Jones"),"39192","F","4000"),
      Row(Row("Jen","Mary","Brown"),"","F","-1")
)

val schema = new StructType()
      .add("name",new StructType()
      .add("firstname",StringType)
      .add("middlename",StringType)
      .add("lastname",StringType))
      .add("dob",StringType)
      .add("gender",StringType)
      .add("salary",StringType)

val df = spark.createDataFrame(spark.sparkContext.parallelize(data),schema)

1. Update the column value

Spark withColumn() function of the DataFrame is used to update the value of a column. withColumn() function takes 2 arguments; first the column you wanted to update and the second the value you wanted to update with.


// Update the column value
df.withColumn("salary",col("salary")*100)

If the column name specified not found, it creates a new column with the value specified.

2. Update the column type

Changing the data type on a DataFrame column can be done using cast() function.


// Update the column type
df.withColumn("salary",col("salary").cast("Integer"))

3. Update based on condition

Here, we use when otherwise combination to update the DataFrame column.


// Update based on condition
val df2 = df.withColumn("new_gender", when(col("gender") === "M","Male")
      .when(col("gender") === "F","Female")
      .otherwise("Unknown"))

Happy Learning !!

Leave a Reply

This Post Has One Comment

  1. Anonymous

    Thanks for that, great site