You can sort by column values in pandas DataFrame using sort_values() method. To specify the order, you have to use ascending boolean property; False
for descending and True
for ascending. By default, it is set to True.
In this article, I will explain how to sort pandas DataFrame by column values using ascending order, descending order, multiple columns, pushing NaN to first, resetting index on sort result.
1. Quick Examples of Pandas Sort by Column Values
If you are in a hurry, below are some quick examples of how to sort pandas DataFrame by column values.
# Below are the quick examples.
# Default sort
df2 = df.sort_values('Courses')
# Sort by Descending
df2 = df.sort_values('Courses', ascending=False)
# Sort by multiple columns
df2 = df.sort_values(by=['Courses','Fee'])
# Sort and ignore index
df2 = df.sort_values(by='Courses', ignore_index=True)
# Sort by putting NaN at first
df2 = df.sort_values(by=['Courses','Fee'], na_position='first')
# Sort by function
df2 = df.sort_values(by='Courses', key=lambda col: col.str.lower())
# Sort by heap algorithm
df2 = df.sort_values(by='Courses', kind='heap')
Let’s create a DataFrame with a few rows and columns and execute these examples. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
.
# Create DataFrame
import pandas as pd
technologies = ({
'Courses':["Spark","Hadoop","pandas","Java","Pyspark"],
'Fee' :[20000,25000,30000,22000,26000],
'Duration':['30days','40days','35days','60days','50days'],
'Discount':[1000,2500,1500,1200,3000]
})
df = pd.DataFrame(technologies, index = ['r1','r2','r3','r4','r5'])
print(df)
Yields below output.
# Output:
Courses Fee Duration Discount
r1 Spark 20000 30days 1000
r2 Hadoop 25000 40days 2500
r3 pandas 30000 35days 1500
r4 Java 22000 60days 1200
r5 Pyspark 26000 50days 3000
2. Sort DataFrame by Column Values
By using the df.sort_values() method you can sort a pandas DataFrame by ascending or descending order. When not specified order, by default it does in ascending order.
# Default sort
df2 = df.sort_values('Courses')
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
r2 Java 22000 60days 1200
r1 Spark 20000 30days 1000
r4 Spark 26000 50days 3000
r5 pandas 30000 35days 1500
r3 NaN 25000 40days 2500
In case you wanted to update the existing DataFrame use inplace=True
.
# Default sort with inplace=True
df.sort_values('Courses', inplace=True)
print(df2)
Yields same output as above.
3. Pandas Sort by Descending Order
If you wanted to sort pandas DataFrame by descending order, use ascending=False
. You can also specify different sorting orders for each label.
# Sort by Descending
df2 = df.sort_values('Courses', ascending=False)
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
r5 pandas 30000 35days 1500
r1 Spark 20000 30days 1000
r4 Spark 26000 50days 3000
r2 Java 22000 60days 1200
r3 NaN 25000 40days 2500
4. Sort by Multiple Columns
sort_values()
also supports to sort on multiple columns at a time, pass a list of column names to by param to sort DataFrame by multiple columns.
# Sory by multiple columns
df2 = df.sort_values(by=['Courses','Fee'])
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
r2 Java 22000 60days 1200
r1 Spark 20000 30days 1000
r4 Spark 26000 50days 3000
r5 pandas 30000 35days 1500
r3 NaN 25000 40days 2500
5. Reset Index While Sorting
Sometimes you may need to set the new index on the sorting result, you can do this while sorting by using ignore_index=True
or by calling pandas.DataFrame.reset_index() on sorted DataFrame.
# Sort and ignore index
df2 = df.sort_values(by='Courses', ignore_index=True)
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
0 Java 22000 60days 1200
1 Spark 20000 30days 1000
2 Spark 26000 50days 3000
3 pandas 30000 35days 1500
4 NaN 25000 40days 2500
6. Sorting by NaN at First
By default, NaN on values are pushed at the bottom of the DataFrame, you can push it at the beginning by using na_position='first'
param. If you don’t want the NaN values, use dropna() to drop rows with NaN.
# Sory by putting NaN at first
df2 = df.sort_values(by=['Courses','Fee'], na_position='first')
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
r3 NaN 25000 40days 2500
r2 Java 22000 60days 1200
r1 Spark 20000 30days 1000
r4 Spark 26000 50days 3000
r5 pandas 30000 35days 1500
7. Pandas Sort Column by Custom Function
In case you wanted to apply a custom or any existing function to sort, you can use key param. The below example converts the Courses
to lower case and does the sorting.
# Sort column by custom function
df2 = df.sort_values(by='Courses', key=lambda col: col.str.lower())
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
r2 Java 22000 60days 1200
r5 pandas 30000 35days 1500
r1 Spark 20000 30days 1000
r4 Spark 26000 50days 3000
r3 NaN 25000 40days 2500
Finally, you can also sort by using different sort algorithms. I will leave this to you to explore.
8. Complete Example of pandas Sort by Column Values
# Create a pandas DataFrame.
import pandas as pd
import numpy as np
technologies = ({
'Courses':["Spark",np.nan,"pandas","Java","Spark"],
'Fee' :[20000,25000,30000,22000,26000],
'Duration':['30days','40days','35days','60days','50days'],
'Discount':[1000,2500,1500,1200,3000]
})
df = pd.DataFrame(technologies, index = ['r1','r3','r5','r2','r4'])
print(df)
# Default sort
df2 = df.sort_values('Courses')
print(df2)
# Sort by Descending
df2 = df.sort_values('Courses', ascending=False)
#print(df2)
# Sory by multiple columns
df2 = df.sort_values(by=['Courses','Fee'])
#print(df2)
# Sort and ignore index
df2 = df.sort_values(by='Courses', ignore_index=True)
print(df2)
# Sory by putting NaN at first
df2 = df.sort_values(by=['Courses','Fee'], na_position='first')
print(df2)
# Sort by function
df2 = df.sort_values(by='Courses', key=lambda col: col.str.lower())
print(df2)
# Sort by heap algorithm
df2 = df.sort_values(by='Courses', kind='heap')
print(df2)
9. Conclusion
In this article, you have learned how to sort DataFrame by column values using Dataframe.sort_values() by ascending or descending order. Also, learned how to use custom functions using lambda expressions.
Happy Learning !!
Related Articles
- Drop Infinite Values From Pandas DataFrame
- How to Change Position of a Column in Pandas
- Change the Order of Pandas DataFrame Columns
- How to Install Anaconda & Run Jupyter Notebook
- Install pandas on Windows Step-by-Step
- Sort Pandas DataFrame by Date (Datetime)
- Pandas Series.sort_values() With Examples
- pandas DataFrame.sort_index() – Sort by Index
- pandas.DataFrame.sort_values() – Examples