Spark SQL Sort Functions – Complete List

Spark SQL provides built-in standard sort functions define in DataFrame API, these come in handy when we need to make sorting on the DataFrame column. All these accept input as, column name in String and returns a Column type.

When possible try to leverage standard library as they are little bit more compile-time safety, handles null and perform better when compared to UDF’s. If your application is critical on performance try to avoid using custom UDF at all costs as UDF does not guarantee performance.

Spark SQL sort functions are grouped as “sort_funcs” in spark SQL, these sort functions come handy when we want to perform any ascending and descending operations on columns.

These are primarily used on the Sort function of the Dataframe or Dataset.

SPARK SQL SORT FUNCTION SYNTAXSPARK FUNCTION DESCRIPTION
asc(columnName: String): Columnasc function is used to specify the ascending order of the sorting column on DataFrame or DataSet
asc_nulls_first(columnName: String): ColumnSimilar to asc function but null values return first and then non-null values
asc_nulls_last(columnName: String): ColumnSimilar to asc function but non-null values return first and then null values
desc(columnName: String): Columndesc function is used to specify the descending order of the DataFrame or DataSet sorting column.
desc_nulls_first(columnName: String): ColumnSimilar to desc function but null values return first and then non-null values.
desc_nulls_last(columnName: String): ColumnSimilar to desc function but non-null values return first and then null values.

asc() – ascending function

asc function is used to specify the ascending order of the sorting column on DataFrame or DataSet.

Syntax: asc(columnName: String): Column 

asc_nulls_first() – ascending with nulls first

Similar to asc function but null values return first and then non-null values.

asc_nulls_first(columnName: String): Column 

asc_nulls_last()  – ascending with nulls last

Similar to asc function but non-null values return first and then null values.

asc_nulls_last(columnName: String): Column 

desc() – descending function

desc function is used to specify the descending order of the DataFrame or DataSet sorting column.

desc(columnName: String): Column 

desc_nulls_first() – descending with nulls first

Similar to desc function but null values return first and then non-null values.

desc_nulls_first(columnName: String): Column 

desc_nulls_last() – descending with nulls last

Similar to desc function but non-null values return first and then null values.

desc_nulls_last(columnName: String): Column

Reference : Spark Functions scala code

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply