• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:11 mins read
You are currently viewing Pandas Sort by Column Values DataFrame

You can sort by column values in pandas DataFrame using the sort_values() method. To specify the order, you have to use the ascending boolean property; False for descending and True for ascending. By default, it is set to True.

In this article, I will explain how to sort pandas DataFrame by column values using ascending order, descending order, multiple columns, pushing NaN to first, and resetting the index on the sort result.

1. Quick Examples of Pandas Sort by Column Values

If you are in a hurry, below are some quick examples of sorting pandas DataFrame by column values.


# Below are the quick examples.

# Example 1: Default sort
df2 = df.sort_values('Courses')

# Example 2: Sort by Descending
df2 = df.sort_values('Courses', ascending=False)

# Example 3: Sort by multiple columns
df2 = df.sort_values(by=['Courses','Fee'])

# Example 4: Sort and ignore index
df2 = df.sort_values(by='Courses', ignore_index=True)

# Example 5: Sort by putting NaN at first
df2 = df.sort_values(by=['Courses','Fee'], na_position='first')

# Example 6: Sort by function
df2 = df.sort_values(by='Courses', key=lambda col: col.str.lower())

## Example 7: Sort by heap algorithm
df2 = df.sort_values(by='Courses', kind='heap')

Let’s create a DataFrame with a few rows and columns and execute these examples. Our DataFrame contains column names Courses, Fee, Duration, and Discount.


# Create DataFrame
import pandas as pd
technologies = ({
    'Courses':["Spark","Hadoop","Pandas","Java","Pyspark"],
    'Fee' :[20000,25000,30000,22000,26000],
    'Duration':['30days','40days','35days','60days','50days'],
    'Discount':[1000,2500,1500,1200,3000]
               })
df = pd.DataFrame(technologies, index = ['r1','r2','r3','r4','r5'])
print("Create DataFrame:\n", df)

Yields below output.

pandas sort column values

2. Sort DataFrame by Column Values

Using the df.sort_values() method you can sort a pandas DataFrame by ascending or descending order. When not specified in order, by default it does in ascending order.


# Default sort
df2 = df.sort_values('Courses')
print("After sorting the DataFrame by column values:\n", df2)

Yields below output.

pandas sort column values

In case you want to update the existing DataFrame use inplace=True.


# Default sort with inplace=True
df.sort_values('Courses', inplace=True)
print("After sorting the DataFrame by column values:\n", df)

Yields the same output as above.

3. Pandas Sort by Descending Order

If you want to sort Pandas DataFrame by descending order, use ascending=False. You can also specify different sorting orders for each label.


# Sort by Descending
df2 = df.sort_values('Courses', ascending=False)
print("After sorting the DataFrame in descending order:\n", df2)

Yields below output.


# Output:
# After sorting the DataFrame in descending order:
     Courses    Fee Duration  Discount
r1    Spark  20000   30days      1000
r5  Pyspark  26000   50days      3000
r3   Pandas  30000   35days      1500
r4     Java  22000   60days      1200
r2   Hadoop  25000   40days      2500

4. Sort by Multiple Columns

sort_values() also supports sorting on multiple columns at a time, passing a list of column names to by param to sort DataFrame by multiple columns.


# Sory by multiple columns
df2 = df.sort_values(by=['Courses','Fee'])
print("After sorting the DataFrame by column values:\n", df2)

Yields below output.


# Output:
After sorting the DataFrame by column values:
     Courses    Fee Duration  Discount
r2   Hadoop  25000   40days      2500
r4     Java  22000   60days      1200
r3   Pandas  30000   35days      1500
r5  Pyspark  26000   50days      3000
r1    Spark  20000   30days      1000

5. Reset Index While Sorting

Sometimes you may need to set the new index on the sorting result, you can do this while sorting by using ignore_index=True or by calling pandas.DataFrame.reset_index() on sorted DataFrame.


# Sort and ignore index
df2 = df.sort_values(by='Courses', ignore_index=True)
print("After sorting the DataFrame by column values:\n", df2)

Yields below output.


# Output:
# After sorting the DataFrame by column values:
    Courses    Fee Duration  Discount
0   Hadoop  25000   40days      2500
1     Java  22000   60days      1200
2   Pandas  30000   35days      1500
3  Pyspark  26000   50days      3000
4    Spark  20000   30days      1000

6. Sorting by NaN at First

By default, NaN on values are pushed at the bottom of the DataFrame, you can push it at the beginning by using na_position='first' param. If you don’t want the NaN values, use dropna() to drop rows with NaN.


# Sory by putting NaN at first
df2 = df.sort_values(by=['Courses','Fee'], na_position='first')
print("After sorting the DataFrame by column values:\n", df2)

Yields below output.


# Output:
# After sorting the DataFrame by column values:
     Courses    Fee Duration  Discount
r2   Hadoop  25000   40days      2500
r4     Java  22000   60days      1200
r3   Pandas  30000   35days      1500
r5  Pyspark  26000   50days      3000
r1    Spark  20000   30days      1000

7. Pandas Sort Column by Custom Function

In case, you want to apply a custom or any existing function to sort, you can use key param. The below example converts the Courses to lower case and does the sorting.


# Sort column by custom function
df2 = df.sort_values(by='Courses', key=lambda col: col.str.lower())
print("After sorting the DataFrame by column values:\n", df2)

Yields below output.


# Output:
# After sorting the DataFrame by column values:
     Courses    Fee Duration  Discount
r2   Hadoop  25000   40days      2500
r4     Java  22000   60days      1200
r3   Pandas  30000   35days      1500
r5  Pyspark  26000   50days      3000
r1    Spark  20000   30days      1000

Finally, you can also sort by using different sort algorithms. I will leave this to you to explore.

8. Complete Example of pandas Sort by Column Values


# Create a pandas DataFrame.
import pandas as pd
import numpy as np
technologies = ({
    'Courses':["Spark",np.nan,"pandas","Java","Spark"],
    'Fee' :[20000,25000,30000,22000,26000],
    'Duration':['30days','40days','35days','60days','50days'],
    'Discount':[1000,2500,1500,1200,3000]
               })
df = pd.DataFrame(technologies, index = ['r1','r3','r5','r2','r4'])
print(df)

# Default sort
df2 = df.sort_values('Courses')
print(df2)

# Sort by Descending
df2 = df.sort_values('Courses', ascending=False)
#print(df2)

# Sory by multiple columns
df2 = df.sort_values(by=['Courses','Fee'])
#print(df2)

# Sort and ignore index
df2 = df.sort_values(by='Courses', ignore_index=True)
print(df2)

# Sory by putting NaN at first
df2 = df.sort_values(by=['Courses','Fee'], na_position='first')
print(df2)

# Sort by function
df2 = df.sort_values(by='Courses', key=lambda col: col.str.lower())
print(df2)

# Sort by heap algorithm
df2 = df.sort_values(by='Courses', kind='heap')
print(df2)

9. Conclusion

In this article, you have learned how to sort DataFrame by column values using Dataframe.sort_values() by ascending or descending order. Also, learned how to use custom functions using lambda expressions.

Happy Learning !!

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply