• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:8 mins read
You are currently viewing pandas DataFrame.sort_index() – Sort by Index

pandas DataFrame.sort_index() function is used to sort the pandas DataFrame by index or columns by name/labels. This function takes several parameters like axis, level, ascending, inplace, kind, na_position, sort_remaining, ignore_index, and key and returns a new DataFrame with the sorted result. Use inplace=True to update the existing DataFrame.

sort_index() key Points

  • Applied soring on axis, not on data.
  • Use axis=1 to sort by column names, axis-0 to sort by index.
  • Supports different sorting algorithms ‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’

Here, I will explain the syntax, usage, and explanation with examples of sort_index() method.

1. Syntax of DataFrame.sort_index()

Following is the syntax of pandas.DataFrame.sort_index()


# Syntax of sort_index() function
DataFrame.sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index=False, key=None)
  • axis – Axis to be sorted,default set to 0. 0 or ‘index’ & 1 or ‘columns’
  • level – f not None, sort on values in specified index level(s) 
  • ascending – bool or list of bool. Specify to do sort by ascending or descending order. Default ascending.
  • inplace – If True, updates existing DataFrame. Default set to False.
  • kind – Sorting alorithm to choose from {‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’}, default ‘quicksort’
  • na_position – Specify where to keep the NaN’s. {‘first’, ‘last’}, Default set to ‘last’.
  • sort_remaining – If true and sorting by level and index is multilevel, sort by other levels too (in order) after sorting by specified level
  • ignore_index – Specify to reset the index starting from zero. Default set to false.
  • keycallable, optional

Let’s understand these parameters by running some examples. First, let’s create a DataFrame with a few rows and columns. Our DataFrame contains column names Courses, Fee, Duration, and Discount.


# Create a pandas DataFrame.
import pandas as pd
import numpy as np
technologies = ({
    'Courses':["Spark",np.nan,"pandas","Java","Spark"],
    'Fee' :[20000,25000,30000,22000,26000],
    'Duration':['30days','40days','35days','60days','50days'],
    'Discount':[1000,2500,1500,1200,3000]
               })
df = pd.DataFrame(technologies, index = [101,123,115,340,100])
print(df)

2. pandas Sort by Index

pandas sort_index() function by default sort DataFrame rows by index in ascending order. This by default returns a new DataFrame after sorting. Use inplace=True to update on existing DataFrame in place and returns a None.


# Default sort by index labels
df2 = df.sort_index()
print(df2)

Yields below output.


# Output:
    Courses    Fee Duration  Discount
100   Spark  26000   50days      3000
101   Spark  20000   30days      1000
115  pandas  30000   35days      1500
123     NaN  25000   40days      2500
340    Java  22000   60days      1200

3. Sort by Descending Order

By default sort happens by ascending order, to change the order to descending use ascending=False param.


# Sort by Descending order
df2 = df.sort_index(ascending=False)
print(df2)

Yields below output.


# Output:
    Courses    Fee Duration  Discount
340    Java  22000   60days      1200
123     NaN  25000   40days      2500
115  pandas  30000   35days      1500
101   Spark  20000   30days      1000
100   Spark  26000   50days      3000

4. Reset Index on Sorted Result

Sometimes after sorting you may require to reset the index starting from zero. In order to reset the index use ignore_index=True. This ignores the index and creates the new one.


# Sort ignoring index
df2 = df.sort_index(ignore_index=True)
print(df2)

Yields below output.


# Output:
  Courses    Fee Duration  Discount
0   Spark  26000   50days      3000
1   Spark  20000   30days      1000
2  pandas  30000   35days      1500
3     NaN  25000   40days      2500
4    Java  22000   60days      1200

5. Sort by Column Names/Labels

By default, soring happen on index labels, Use axis=1 to change this and sort on columns by name in pandas DataFrame.


# Sort by column names
df2 = df.sort_index(axis=1)
print(df2)

Yields below output.


# Output:
    Courses  Discount Duration    Fee
101   Spark      1000   30days  20000
123     NaN      2500   40days  25000
115  pandas      1500   35days  30000
340    Java      1200   60days  22000
100   Spark      3000   50days  26000

Conclusion

In this article, you have learned the syntax of the sort_index() method, sorting rows by index and sorting DataFrame by column names.

Happy Learning !!

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium