pandas.DataFrame.sort_values() – Examples

pandas.DataFrame.sort_values() function can be used to sort (ascending or descending order) DataFrame by axis. This method takes by, axis, ascending, inplace, kind, na_position, ignore_index, and key parameters and returns a sorted DataFrame. Use inplace=True param to apply to sort on existing DataFrame. To specify the order, you have to use ascending boolean property; False for descending and True for ascending. By default, it is set to True.

sort_values() Key Points:

  • It supports and provides a param to choose from ‘quicksort’, ‘mergesort’ and ‘heapsort’ algorithms.
  • By default performs ascending order.
  • All NaN values are pushed towards end on sorted column. Provides a way to show it at top.
  • You can reset the index on sorted result.

1. Syntax of DataFrame.sort_values()

Following is the syntax of pandas.DataFrame.sort_values(). Use DataFrame.sort_index() to sort by Index.


# Syntax of DataFrame.sort_values()
DataFrame.sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key=None)
  • by – Accepts column/row name as list or list of names to sort.
  • axis – Axis to be sorted,default set to 0. 0 or ‘index’ & 1 or ‘columns’
  • ascending – bool or list of bool. Specify to do sort by ascending or descending order. Default ascending.
  • inplace – If True, updates existing DataFrame. Default set to False.
  • kind – Sorting alorithm to choose from {‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’}, default ‘quicksort’
  • na_position – Specify where to keep the NaN’s. {‘first’, ‘last’}, Default set to ‘last’.
  • ignore_index – Specify to reset the index starting from zero. Default set to false.
  • keycallable, optional

Let’s understand these parameters by running some examples. First, let’s create a pandas DataFrame from Dict.


import pandas as pd
import numpy as np
technologies = ({
    'Courses':["Spark",np.nan,"pandas","Java","Spark"],
    'Fee' :[20000,25000,30000,22000,26000],
    'Duration':['30days','40days','35days','60days','50days'],
    'Discount':[1000,2500,1500,1200,3000]
               })
df = pd.DataFrame(technologies, index = ['r1','r3','r5','r2','r4'])
print(df)

Yields below output.


   Courses    Fee Duration  Discount
r1   Spark  20000   30days      1000
r3     NaN  25000   40days      2500
r5  pandas  30000   35days      1500
r2    Java  22000   60days      1200
r4   Spark  26000   50days      3000

2. Use sort_values() to Sort pandas DataFrame

By using the DataFrame.sort_values() method you can sort a pandas DataFrame column values by ascending or descending order. When not specified order, by default it does in ascending order.


# Default sort
df2 = df.sort_values('Courses')
print(df2)

Yields below output.


   Courses    Fee Duration  Discount
r2    Java  22000   60days      1200
r1   Spark  20000   30days      1000
r4   Spark  26000   50days      3000
r5  pandas  30000   35days      1500
r3     NaN  25000   40days      2500

In case you wanted to update the existing DataFrame use inplace=True.


# Default sort with inplace=True
df.sort_values('Courses', inplace=True)
print(df2)

Yields same output as above.

3. pandas Sort by Descending Order

To sort pandas DataFrame column values by descending order, use ascending=False. You can also specify different sorting orders for each label.


# Sort by Descending
df2 = df.sort_values('Courses', ascending=False)
print(df2)

Yields below output.


   Courses    Fee Duration  Discount
r5  pandas  30000   35days      1500
r1   Spark  20000   30days      1000
r4   Spark  26000   50days      3000
r2    Java  22000   60days      1200
r3     NaN  25000   40days      2500

4. Sort by Two Columns

by parameter also supports a list of labels, use this to sort DataFrame by multiple columns, refer to this article where I have explained with several examples.


# Sory by multiple columns
df2 = df.sort_values(by=['Courses','Fee'])
print(df2)

Yields below output.


   Courses    Fee Duration  Discount
r2    Java  22000   60days      1200
r1   Spark  20000   30days      1000
r4   Spark  26000   50days      3000
r5  pandas  30000   35days      1500
r3     NaN  25000   40days      2500

5. Reset Index

Sometimes you may need to set the new index on the sorting result, you can do this while sorting by using ignore_index=True or by calling pandas.DataFrame.reset_index() on sorted DataFrame.


# Sort and ignore index
df2 = df.sort_values(by='Courses', ignore_index=True)
#print(df2)


  Courses    Fee Duration  Discount
0    Java  22000   60days      1200
1   Spark  20000   30days      1000
2   Spark  26000   50days      3000
3  pandas  30000   35days      1500
4     NaN  25000   40days      2500

6. NaN at First

By default, NaN on values are pushed at the bottom of the DataFrame, you can push it at the beginning by using na_position='first' param.


# Sory by putting NaN at first
df2 = df.sort_values(by=['Courses','Fee'], na_position='first')
print(df2)

Yields below output.


   Courses    Fee Duration  Discount
r3     NaN  25000   40days      2500
r2    Java  22000   60days      1200
r1   Spark  20000   30days      1000
r4   Spark  26000   50days      3000
r5  pandas  30000   35days      1500

7. pandas Sort by Custom Function

In case you wanted to apply a custom or any existing function to sort, you can use key param. The below example converts the Courses to lower case and does the sorting.


# Sort by function
df2 = df.sort_values(by='Courses', key=lambda col: col.str.lower())
print(df2)

Yields below output.


   Courses    Fee Duration  Discount
r2    Java  22000   60days      1200
r5  pandas  30000   35days      1500
r1   Spark  20000   30days      1000
r4   Spark  26000   50days      3000
r3     NaN  25000   40days      2500

Finally, you can also sort by using different sort algorithms. I will leave this to you to explore.

Conclusion

In this article, you have learned how to sort pandas DataFrame column values by ascending or descending order by using different params of sort_values() function. Also, learned this function supports and provides a param to choose from ‘quicksort’, ‘mergesort’, and ‘heapsort’ algorithms.

Happy Learning !!

You May Also Like

References

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing pandas.DataFrame.sort_values() – Examples