You are currently viewing Spark – Sort multiple DataFrame columns

In Spark , sort, and orderBy functions of the DataFrame are used to sort multiple DataFrame columns, you can also specify asc for ascending and desc for descending to specify the order of the sorting.

When sorting on multiple columns, you can also specify certain columns to sort on ascending and certain columns on descending.

1. Using sort() to sort multiple columns

In Spark, We can use sort() function of the DataFrame to sort the multiple columns. If you wanted to ascending and descending, use asc and desc on Column.


// Using sort() to sort multiple columns
df.sort("department","state")
df.sort(col("department").asc,col("state").desc)

2. Using orderBy() to sort multiple columns

Alternatively, we can also use orderBy() function of the DataFrame to sort the multiple columns. and use asc for ascending and desc for descending.


// Using orderBy() to sort multiple columns
df.orderBy("department","state")
df.orderBy(col("department").asc,col("state").desc)

Happy Learning !!

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium