pandas.DataFrame.sort_values() function can be used to sort (ascending or descending order) DataFrame by axis. This method takes by
, axis
, ascending
, inplace
, kind
, na_position
, ignore_index
, and key
parameters and returns a sorted DataFrame. Use inplace=True
param to apply to sort on existing DataFrame. To specify the order, you have to use ascending
boolean property; False
for descending and True
for ascending. By default, it is set to True.
sort_values() Key Points:
- It supports and provides a param to choose from ‘quicksort’, ‘mergesort’ and ‘heapsort’ algorithms.
- By default performs ascending order.
- All NaN values are pushed towards end on sorted column. Provides a way to show it at top.
- You can reset the index on sorted result.
- Pandas DataFrame’s sort_values() method sorts the DataFrame’s rows by specified column(s).
- It takes parameters such as ‘by’ to specify the column(s) to sort by and ‘ascending’ to control the sorting order.
1. Syntax of DataFrame.sort_values()
Following is the syntax of pandas.DataFrame.sort_values(). Use DataFrame.sort_index() to sort by Index.
# Syntax of DataFrame.sort_values()
DataFrame.sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key=None)
by
– Accepts column/row name as list or list of names to sort.axis
– Axis to be sorted,default set to 0. 0 or ‘index’ & 1 or ‘columns’ascending
– bool or list of bool. Specify to do sort by ascending or descending order. Default ascending.inplace
– If True, updates existing DataFrame. Default set to False.kind
– Sorting alorithm to choose from {‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’}, default ‘quicksort’na_position
– Specify where to keep the NaN’s. {‘first’, ‘last’}, Default set to ‘last’.ignore_index
– Specify to reset the index starting from zero. Default set to false.key
– callable, optional
Let’s understand these parameters by running some examples. First, let’s create a pandas DataFrame from Dict.
import pandas as pd
import numpy as np
technologies = ({
'Courses':["Spark",np.nan,"pandas","Java","Spark"],
'Fee' :[20000,25000,30000,22000,26000],
'Duration':['30days','40days','35days','60days','50days'],
'Discount':[1000,2500,1500,1200,3000]
})
df = pd.DataFrame(technologies, index = ['r1','r3','r5','r2','r4'])
print(df)
Yields below output.
# Output:
Courses Fee Duration Discount
r1 Spark 20000 30days 1000
r3 NaN 25000 40days 2500
r5 pandas 30000 35days 1500
r2 Java 22000 60days 1200
r4 Spark 26000 50days 3000
2. Use sort_values() to Sort pandas DataFrame
By using the DataFrame.sort_values()
method you can sort a pandas DataFrame column values by ascending or descending order. When not specified order, by default it does in ascending order.
# Default sort
df2 = df.sort_values('Courses')
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
r2 Java 22000 60days 1200
r1 Spark 20000 30days 1000
r4 Spark 26000 50days 3000
r5 pandas 30000 35days 1500
r3 NaN 25000 40days 2500
In case you wanted to update the existing DataFrame use inplace=True
.
# Default sort with inplace=True
df.sort_values('Courses', inplace=True)
print(df2)
Yields the same output as above.
3. Pandas Sort by Descending Order
To sort pandas DataFrame column values by descending order, use ascending=False
. You can also specify different sorting orders for each label.
# Sort by Descending
df2 = df.sort_values('Courses', ascending=False)
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
r5 pandas 30000 35days 1500
r1 Spark 20000 30days 1000
r4 Spark 26000 50days 3000
r2 Java 22000 60days 1200
r3 NaN 25000 40days 2500
4. Sort by Two Columns
by
parameter also supports a list of labels, use this to sort DataFrame by multiple columns, refer to this article where I have explained with several examples.
# Sory by multiple columns
df2 = df.sort_values(by=['Courses','Fee'])
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
r2 Java 22000 60days 1200
r1 Spark 20000 30days 1000
r4 Spark 26000 50days 3000
r5 pandas 30000 35days 1500
r3 NaN 25000 40days 2500
5. Reset Index
Sometimes you may need to set the new index on the sorting result, you can do this while sorting by using ignore_index=True
or by calling pandas.DataFrame.reset_index() on sorted DataFrame.
# Sort and ignore index
df2 = df.sort_values(by='Courses', ignore_index=True)
#print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
0 Java 22000 60days 1200
1 Spark 20000 30days 1000
2 Spark 26000 50days 3000
3 pandas 30000 35days 1500
4 NaN 25000 40days 2500
6. NaN at First
By default, NaN on values are pushed at the bottom of the DataFrame, you can push it at the beginning by using na_position='first'
param.
# Sory by putting NaN at first
df2 = df.sort_values(by=['Courses','Fee'], na_position='first')
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
r3 NaN 25000 40days 2500
r2 Java 22000 60days 1200
r1 Spark 20000 30days 1000
r4 Spark 26000 50days 3000
r5 pandas 30000 35days 1500
7. Pandas Sort by Custom Function
In case you wanted to apply a custom or any existing function to sort, you can use key param. The below example converts the Courses
to lower case and does the sorting.
# Sort by function
df2 = df.sort_values(by='Courses', key=lambda col: col.str.lower())
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
r2 Java 22000 60days 1200
r5 pandas 30000 35days 1500
r1 Spark 20000 30days 1000
r4 Spark 26000 50days 3000
r3 NaN 25000 40days 2500
Finally, you can also sort by using different sort algorithms. I will leave this to you to explore.
Frequently Asked Questions on pandas.DataFrame.sort_values()
The purpose of the pandas.DataFrame.sort_values() method is to sort the rows of a DataFrame based on the values of one or more columns.
By default, NaNs are placed at the end of the sorted DataFrame when sorting in ascending order and at the beginning when sorting in descending order.
You can specify multiple columns for sorting using the pandas.DataFrame.sort_values() method. You can pass a list of column names to the ‘by’ parameter to indicate the columns by which you want to sort. The method will sort the DataFrame based on the values of the specified columns in the order they are provided in the list.
The pandas.DataFrame.sort_values() method does not modify the original DataFrame. Instead, it returns a new DataFrame with the rows sorted according to the specified criteria. The original DataFrame remains unchanged unless you explicitly assign the sorted DataFrame back to the original variable.
You can sort by index values by specifying axis=0
or by passing the index level(s) to the ‘by’ parameter.
Conclusion
In this article, you have learned how to sort pandas DataFrame column values by ascending or descending order by using different params of sort_values()
function. Also, learned this function supports and provides a param to choose from ‘quicksort’, ‘mergesort’, and ‘heapsort’ algorithms.
Happy Learning !!
Related Articles
- Drop Infinite Values From Pandas DataFrame
- How to Change Position of a Column in Pandas
- Change the Order of Pandas DataFrame Columns
- How to Install Anaconda & Run Jupyter Notebook
- Install pandas on Windows Step-by-Step
- Pandas Get Statistics For Each Group?
- Pandas Series.sort_values() With Examples
- Sort Pandas DataFrame by Date (Datetime)
- Pandas Sort by Column Values DataFrame
- Pandas Groupby Sort within Groups
- Pandas Get DataFrame Shape
- Pandas Check If DataFrame is Empty
- How to Sort Multiple Columns in Pandas DataFrame
- Pandas Series.sort_values() With Examples