In Polars, the sort()
method is used to sort a DataFrame based on one or more columns, allowing customization of the sorting order for ascending or descending arrangements. It is highly efficient and optimized to handle large datasets effortlessly.
In this article, I will explain the Polars DataFrame.sort()
method by using its syntax, parameters, and usage to demonstrate how it returns a new DataFrame sorted according to the specified conditions.
Key Points –
- The
sort()
method organizes rows of a DataFrame based on specified column(s). - It allows sorting by a single column or multiple columns simultaneously.
- It takes a column name (
str
) or a list of column names (List[str]
) as theby
parameter. - The
sort()
method does not modify the original DataFrame but returns a new, sorted DataFrame. - It works seamlessly with columns containing integers, floats, strings, and other data types, as long as the column type is consistent.
Polars DataFrame.sort() Introduction
Following is the syntax of the Polars DataFrame sort() method.
# Syntax of polars DataFrame.sort()
DataFrame.sort(
by: IntoExpr | Iterable[IntoExpr], # Column(s) or expressions to sort by
*more_by: IntoExpr, # Additional columns/expressions for sorting
descending: bool | Sequence[bool] = False, # Sort order: descending or ascending
nulls_last: bool | Sequence[bool] = False, # Place nulls at the end or start
multithreaded: bool = True, # Use multithreading for sorting
maintain_order: bool = False # Maintain order of equal elements
) → DataFrame
Parameters of the Polars DataFrame.sort()
Following are the parameters of the polars DataFrame.sort() method.
by
– Specifies the column name(s) or expression(s) to sort by. Accepts a single column, multiple columns, or an expression.more_by
– Allows sorting by additional columns or expressions after the primary column.descending
– Accepts a singleTrue
/False
value or a list for sorting multiple columns.True
– Sort in descending order.False
– Sort in ascending order.
nulls_last
– Specifies whethernull
values appear at the end (True
) or start (False
) of the sorted result.multithreaded
– Enables multithreaded sorting for better performance. Default isTrue
.maintain_order
– Ensures that rows with equal values maintain their original relative order when sorting. Default isFalse
(faster sorting without order preservation).
Usage of Polars DataFrame.sort() Method
The DataFrame.sort()
method sorts the rows of a Polars DataFrame based on one or more specified columns. The sorting can be done in ascending (default) or descending order.
Now, let’s create a Polars DataFrame using data from a dictionary.
import polars as pl
# Creating a new Polars DataFrame
technologies= {
'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
'Fees' :[22000,25000,20000,24000,26000],
'Duration':['30days','50days','40days','50days','40days'],
'Discount':[1000,2300,1500,1200,2500]
}
df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)
Yields below output.
Sort by a Single Column (Ascending)
To sort by a single column in ascending order in Polars, you can use the sort()
method and specify the column name.
# Sorting by 'Fees' in ascending order
sorted_df = df.sort("Fees")
print("DataFrame sorted by 'Fees' (ascending):\n", sorted_df)
Here,
- The
sort()
method is used to sort the DataFrame by the columnFees
. - By default, the sorting is done in ascending order.
- The
Fees
column is now sorted from smallest to largest, and the entire DataFrame rows are reordered accordingly.
Sorting by a Single Column (Descending)
To sort the DataFrame by a single column in descending order, you can use the sort()
method with the descending=True
parameter.
# Sorting by "Fees" in descending order
sorted_df = df.sort(by="Fees", descending=True)
print("Sorted DataFrame by Fees (Descending):\n", sorted_df)
# Output:
# Sorted DataFrame by Fees (Descending):
# shape: (5, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fees ┆ Duration ┆ Discount │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 │
╞═════════╪═══════╪══════════╪══════════╡
│ Pandas ┆ 26000 ┆ 40days ┆ 2500 │
│ PySpark ┆ 25000 ┆ 50days ┆ 2300 │
│ Python ┆ 24000 ┆ 50days ┆ 1200 │
│ Spark ┆ 22000 ┆ 30days ┆ 1000 │
│ Hadoop ┆ 20000 ┆ 40days ┆ 1500 │
└─────────┴───────┴──────────┴──────────┘
In the above examples, This sorts the df
DataFrame by the "Fees"
column in descending order, from the highest fee to the lowest.
Sorting by Multiple Columns
To sort the DataFrame by multiple columns, you can specify multiple column names in the by
parameter and set the corresponding sorting orders in the descending
parameter.
# Sorting by "Duration" (ascending) and then by "Fees" (descending)
sorted_df = df.sort(by=["Duration", "Fees"], descending=[False, True])
print("Sorted DataFrame by Duration (Ascending) and Fees (Descending):\n", sorted_df)
# Output:
# Sorted DataFrame by Duration (Ascending) and Fees (Descending):
# shape: (5, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fees ┆ Duration ┆ Discount │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 │
╞═════════╪═══════╪══════════╪══════════╡
│ Spark ┆ 22000 ┆ 30days ┆ 1000 │
│ Pandas ┆ 26000 ┆ 40days ┆ 2500 │
│ Hadoop ┆ 20000 ┆ 40days ┆ 1500 │
│ PySpark ┆ 25000 ┆ 50days ┆ 2300 │
│ Python ┆ 24000 ┆ 50days ┆ 1200 │
└─────────┴───────┴──────────┴──────────┘
Here,
- The DataFrame is first sorted by
"Duration"
in ascending order (False
). - Within the same
"Duration"
values, the DataFrame is sorted by"Fees"
in descending order (True
).
To sort by multiple columns in Polars, you can pass a list of columns to the by
parameter in the sort()
method.
# Use DataFrame sort() method
sorted_df = df.sort(by=["Duration", "Fees"], descending=True)
print("Sorted DataFrame:\n", sorted_df)
# Output:
# Sorted DataFrame:
# shape: (5, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fees ┆ Duration ┆ Discount │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 │
╞═════════╪═══════╪══════════╪══════════╡
│ PySpark ┆ 25000 ┆ 50days ┆ 2300 │
│ Python ┆ 24000 ┆ 50days ┆ 1200 │
│ Pandas ┆ 26000 ┆ 40days ┆ 2500 │
│ Hadoop ┆ 20000 ┆ 40days ┆ 1500 │
│ Spark ┆ 22000 ┆ 30days ┆ 1000 │
└─────────┴───────┴──────────┴──────────┘
Sort by a Column of Strings
To sort a Polars DataFrame by a column of strings, you can use the sort()
method, just like with numeric columns. Polars will sort string columns in lexicographical (alphabetical) order by default. If you want to sort by strings in ascending or descending order, you can specify the descending
parameter.
# Sorting by "Courses" (strings) in ascending order
sorted_df = df.sort(by="Courses", descending=False)
print("Sorted DataFrame by Courses (Ascending):\n", sorted_df)
# Output:
# Sorted DataFrame by Courses (Ascending):
# shape: (5, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fees ┆ Duration ┆ Discount │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 │
╞═════════╪═══════╪══════════╪══════════╡
│ Hadoop ┆ 20000 ┆ 40days ┆ 1500 │
│ Pandas ┆ 26000 ┆ 40days ┆ 2500 │
│ PySpark ┆ 25000 ┆ 50days ┆ 2300 │
│ Python ┆ 24000 ┆ 50days ┆ 1200 │
│ Spark ┆ 22000 ┆ 30days ┆ 1000 │
└─────────┴───────┴──────────┴──────────┘
Here,
- The DataFrame is sorted by the
"Courses"
column in ascending order (default behavior whendescending=False
). - The string values are sorted lexicographically (alphabetically).
Sorting by Strings in Descending Order
If you want to sort the "Courses"
column in descending order, you can set descending=True
.
# Sorting by strings in descending order
sorted_df = df.sort(by="Courses", descending=True)
print("Sorted DataFrame by Courses (Descending):\n", sorted_df)
# Output:
# Sorted DataFrame by Courses (Descending):
# shape: (5, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fees ┆ Duration ┆ Discount │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 │
╞═════════╪═══════╪══════════╪══════════╡
│ Spark ┆ 22000 ┆ 30days ┆ 1000 │
│ Python ┆ 24000 ┆ 50days ┆ 1200 │
│ PySpark ┆ 25000 ┆ 50days ┆ 2300 │
│ Pandas ┆ 26000 ┆ 40days ┆ 2500 │
│ Hadoop ┆ 20000 ┆ 40days ┆ 1500 │
└─────────┴───────┴──────────┴──────────┘
Conclusion
In conclusion, the Polars DataFrame.sort()
method provides an efficient and versatile approach to sorting data within a Polars DataFrame. It enables sorting by one or more columns, supports both ascending and descending orders, and offers the option for in-place sorting.
Happy Learning!!