You are currently viewing Spark – How to update the DataFrame column?

In Spark, updating the DataFrame can be done by using withColumn() transformation function, In this article, I will explain how to update or change the DataFrame column.

I will also explain how to update the column based on condition.

First, let’s create a DataFrame

val data = Seq(Row(Row("James ","","Smith"),"36636","M","3000"),
      Row(Row("Michael ","Rose",""),"40288","M","4000"),
      Row(Row("Robert ","","Williams"),"42114","M","4000"),
      Row(Row("Maria ","Anne","Jones"),"39192","F","4000"),

val schema = new StructType()
      .add("name",new StructType()

val df = spark.createDataFrame(spark.sparkContext.parallelize(data),schema)

1. Update the column value

Spark withColumn() function of the DataFrame is used to update the value of a column. withColumn() function takes 2 arguments; first the column you wanted to update and the second the value you wanted to update with.

// Update the column value

If the column name specified not found, it creates a new column with the value specified.

2. Update the column type

Changing the data type on a DataFrame column can be done using cast() function.

// Update the column type

3. Update based on condition

Here, we use when otherwise combination to update the DataFrame column.

// Update based on condition
val df2 = df.withColumn("new_gender", when(col("gender") === "M","Male")
      .when(col("gender") === "F","Female")

Happy Learning !!

Leave a Reply

This Post Has One Comment

  1. Anonymous

    Thanks for that, great site