How do perform rename multiple columns in Spark DataFrame? In Apache Spark DataFrame, a column represents a named expression that produces a value of a specific data type. You can think of a column as a logical representation of a data field in a table.
In this article, we shall discuss what is how to rename multiple columns or all columns with examples. So, let’s first create a Spark DataFrame with a few columns and use this DataFrame to rename multiple columns.
// Import
import org.apache.spark.sql.SparkSession
// Create SparkSession
val spark:SparkSession = SparkSession.builder()
.master("local[1]").appName("SparkByExamples.com")
.getOrCreate()
// Create DataFrame
import spark.implicits._
val data = Seq((1, "John", 20), (2, "Jane", 25), (3, "Jim", 30))
val df = data.toDF("id", "name", "age")
// Show DataFrame
df.show()
Yields below output.

1. Spark Rename Multiple Columns
To rename multiple columns in Spark you can use the withColumnRenamed() method from the DataFrame, this method takes the old column name and new column name as an argument and returns a DataFrame after renaming a column, so to rename multiple columns you can chain this function as shown below.
// Rename multiple columns
val df2 = df.withColumnRenamed("id","student_id")
.withColumnRenamed("name","student_name")
// Show DataFrame
df2.show()
This example yields the below output. Note that the column name id was renamed to student_id and the name was renamed to student_name.

2. Rename Multiple Column Names from map()
If you have many columns to rename, chaining withColumnRenamed() doesn’t look good so, alternatively, you can rename multiple columns by creating a map object with old and new column names as pairs.
// Import
import org.apache.spark.sql.functions.col
// map() with column names to rename
val columnsToRename = Map("id" -> "student_id", "name" -> "student_name")
// Rename multiple columns
val renamedDF = columnsToRename.foldLeft(df){
case (tempDF, (oldName, newName)) => tempDF.withColumnRenamed(oldName, newName)
}
// Show DataFrame
renamedDF.show()
In this example,
- We define a
mapcalledcolumnsToRename, where the keys represent the old column names and the values represent the new column names. - We then use the
foldLeftoperation to iterate over thecolumnsToRenamemap and rename the columns one by one. - The
withColumnRenamedfunction is used to rename each column in thetempDFDataFrame. - Finally, we assign the renamed DataFrame to a new variable
renamedDFand display it using theshowfunction.
The output of the code for the Spark Rename multiple columns above should be:

3. Rename All Columns from the List
If you wanted to rename all columns, you can easily do by creating a list with new columns and passing it as an argument to toDF() function.
// List with new column names
val newColumnNames = Seq("new_id", "new_name", "new_age")
// Rename all columns
val df3 = df.toDF(newColumnNames:_*)
df3.show()
Or you can also use as below example.
// List with new column names
val newColumnNames = Seq("new_id", "new_name", "new_age")
// Rename all columns
val df4 = newColumnNames.foldLeft(df)((tempDF, newName) =>
tempDF.withColumnRenamed(tempDF.columns(newColumnNames.indexOf(newName)), newName))
In this example,
- We define a list called
newColumnNames, which contains the new column names in the order we want them to appear in the DataFrame. - We then use the
foldLeftoperation to iterate over thenewColumnNameslist and rename the columns one by one. - The
withColumnRenamedfunction is used to rename each column in thetempDFDataFrame. - We use the
columnsfunction to get an array of the current column names andindexOffunction to find the index of the old column name in the array. - Finally, we assign the renamed DataFrame to a new variable
df4and display it using theshowfunction.
The output of the code for the Spark Rename multiple columns above should be:
// Output:
+------+--------+-------+
|new_id|new_name|new_age|
+------+--------+-------+
| 1| John| 20|
| 2| Jane| 25|
| 3| Jim| 30|
+------+--------+-------+
4. Using a for loop and dynamic column names
Finally, you can also iterate the list with new columns and use the for loop to iterate it, and use withColumnRenamed() to rename columns.
// get old column names
val oldColumnNames = df.columns
// New column names
val newColumnNames = oldColumnNames.map(name => s"new_$name")
// Use for loop to rename
for (i <- 0 until oldColumnNames.length) {
df = df.withColumnRenamed(oldColumnNames(i), newColumnNames(i))
}
// Show DataFrame
df.show()
In this example,
- we define the DataFrame
dfwith columns “id”, “name”, and “age”. - We then define an array
oldColumnNamesthat contains the current column names ofdf. - We then use the
mapfunction to create a new arraynewColumnNamesthat contains the new column names, where each name is the old name with the prefix “new_” added to it. - We then use a
forloop to iterate over theoldColumnNamesarray and rename each column using thewithColumnRenamedfunction. - The
withColumnRenamedfunction takes two arguments: the old column name and the new column name. - Finally, we display the renamed DataFrame using the
showfunction.
The output of the code above should be:
// Output:
+------+--------+-------+
|new_id|new_name|new_age|
+------+--------+-------+
| 1| John| 20|
| 2| Jane| 25|
| 3| Jim| 30|
+------+--------+-------+
4. Other Spark Column Operations
In Spark, a column refers to a logical data structure representing a named expression that produces a value for each record in a DataFrame. Columns are the building blocks for constructing DataFrame transformations and manipulations in Spark.
To work with columns in Spark Scala, you can use the org.apache.spark.sql.functions package. This package provides many built-in functions for manipulating and transforming columns in a DataFrame.
Here are some common operations you can perform on columns in Spark Scala:
- Selecting Columns: To select one or more columns from a DataFrame, you can use the
selectfunction. For example, to select columnscol1andcol2from a DataFramedf, you can writedf.select("col1", "col2"). - Filtering Rows: To filter rows based on a condition, you can use the
filter()orwhere()function. For example, to filter rows where the value in thecol1column is greater than 10, you can writedf.filter(col("col1") > 10). - Adding Columns: To add a new column to a DataFrame, you can use the
withColumn()function. For example, to add a new columnnew_colthat is the sum ofcol1andcol2, you can writedf.withColumn("new_col", col("col1") + col("col2")). - Renaming Columns: To rename a column in a DataFrame, you can use the
withColumnRenamed()function. For example, to rename a columncol1tonew_col1, you can writedf.withColumnRenamed("col1", "new_col1"). - Aggregating Data: To aggregate data based on one or more columns, you can use the
groupBy()function. For example, to group data bycol1column and compute the sum of thecol2column for each group, you can writedf.groupBy("col1").agg(sum("col2")).
These are just a few examples of what you can do with columns in Spark Scala. The org.apache.spark.sql.functions package provides many more functions for manipulating and transforming columns, so it’s worth exploring the documentation to learn more.
5. Conclusion
In this article, you have learned different ways of renaming multiple columns in Spark, some approaches involve explicitly specifying the new names for each column using the withColumnRenamed() function or passing a list of old and new column names to the toDF() method. It all depends on the requirement from the action standpoint.
Related Articles
- Spark Merge Two DataFrames with Different Columns or Schema
- Spark withColumnRenamed to Rename Column
- Spark RDD fold() function example
- Spark map() vs flatMap() with Examples
- Spark Internal Execution plan
- Get Other Columns when using GroupBy or Select All Columns with the GroupBy?
- Spark cannot resolve given input columns