Spark - How to update the DataFrame column?

| *** Please Subscribe for Ad Free & Premium Content ***

Post author:Naveen Nelamali
Post category:Apache Spark
Post last modified:January 11, 2023
Reading time:4 mins read

You are currently viewing Spark – How to update the DataFrame column?

In Spark, updating the DataFrame can be done by using withColumn() transformation function, In this article, I will explain how to update or change the DataFrame column.

I will also explain how to update the column based on condition.

First, let’s create a DataFrame


val data = Seq(Row(Row("James ","","Smith"),"36636","M","3000"),
      Row(Row("Michael ","Rose",""),"40288","M","4000"),
      Row(Row("Robert ","","Williams"),"42114","M","4000"),
      Row(Row("Maria ","Anne","Jones"),"39192","F","4000"),
      Row(Row("Jen","Mary","Brown"),"","F","-1")
)

val schema = new StructType()
      .add("name",new StructType()
      .add("firstname",StringType)
      .add("middlename",StringType)
      .add("lastname",StringType))
      .add("dob",StringType)
      .add("gender",StringType)
      .add("salary",StringType)

val df = spark.createDataFrame(spark.sparkContext.parallelize(data),schema)

1. Update the column value

Spark withColumn() function of the DataFrame is used to update the value of a column. withColumn() function takes 2 arguments; first the column you wanted to update and the second the value you wanted to update with.


// Update the column value
df.withColumn("salary",col("salary")*100)

If the column name specified not found, it creates a new column with the value specified.

2. Update the column type

Changing the data type on a DataFrame column can be done using cast() function.


// Update the column type
df.withColumn("salary",col("salary").cast("Integer"))

3. Update based on condition

Here, we use when otherwise combination to update the DataFrame column.


// Update based on condition
val df2 = df.withColumn("new_gender", when(col("gender") === "M","Male")
      .when(col("gender") === "F","Female")
      .otherwise("Unknown"))

Happy Learning !!

This Post Has One Comment

Anonymous October 14, 2021

Thanks for that, great site

1. Update the column value

2. Update the column type

3. Update based on condition

Related Articles

Leave a Reply

This Post Has One Comment