In Pandas, the DataFrame.sort_index()
method is used to sort a DataFrame by its index. This method rearranges the rows of the DataFrame based on the index values. By default, it sorts the DataFrame in ascending order of index values, but you can specify the ascending parameter to change the sorting order.
In this article, I will explain the DataFrame.sort_index()
function, its syntax, parameters, and usage of how to sort the pandas DataFrame by index or columns by name/labels. This function accepts various parameters such as axis
, level
, ascending
, inplace
, kind
, na_position
, sort_remaining
, ignore_index
, and key
. It produces a new DataFrame with the sorted outcome. Setting inplace=True
will modify the existing DataFrame instead.
Key Points –
sort_index()
sorts a DataFrame by its index labels.- Applied soring on axis, not on data.
- Use
axis=1
to sort by column names, axis-0 to sort by index. - Supports different sorting algorithms ‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’.
- The
ascending
parameter allows sorting in either ascending (True
) or descending (False
) order. - By default, it sorts the rows, but you can sort the columns by setting the
axis
parameter to1
.
Pandas DataFrame.sort_index() Introduction
Following is the syntax of pandas.DataFrame.sort_index()
# Syntax of sort_index() function
DataFrame.sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index=False, key=None)
axis
– {0 or ‘index’, 1 or ‘columns’}, default 0. The axis along which to sort.level
– int or level name or list of ints or list of level names, optional. If the axis is a MultiIndex (hierarchical), sort by this level.ascending
– bool or list of bool, default True. Sort ascending vs. descending. Specify a list of bools to apply different sorting orders to different columns.inplace
– bool, default False. If True, perform operation in-place.kind
– {‘quicksort’, ‘mergesort’, ‘heapsort’}, default ‘quicksort’. Choice of sorting algorithm.na_position
– {‘first’, ‘last’}, default ‘last’. Determine placement of NA/null values.sort_remaining
– bool, default True. If True and sorting by level and index is lexicographically sorted, sort by remaining levels and index values.ignore_index
– bool, default False. If True, the resulting axis will be labeled 0, 1, …, n – 1.key
– callable or None, optional. Apply the key function to the index values before sorting.
Let’s understand these parameters by running some examples. let’s create a Pandas DataFrame using data from a dictionary.
# Create a pandas DataFrame.
import pandas as pd
import numpy as np
technologies = ({
'Courses':["Spark",np.nan,"pandas","Java","Spark"],
'Fee' :[20000,25000,30000,22000,26000],
'Duration':['30days','40days','35days','60days','50days'],
'Discount':[1000,2500,1500,1200,3000]
})
df = pd.DataFrame(technologies, index = [101,123,115,340,100])
print(df)
Pandas Sort by Index
Pandas sort_index()
function by default sort DataFrame rows by index in ascending order. This by default returns a new DataFrame after sorting. Use inplace=True
to update on existing DataFrame in place and return a None.
# Default sort by index labels
df2 = df.sort_index()
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
100 Spark 26000 50days 3000
101 Spark 20000 30days 1000
115 pandas 30000 35days 1500
123 NaN 25000 40days 2500
340 Java 22000 60days 1200
Sort by Descending Order
By default sort happens by ascending order, to change the order to descending use ascending=False
param.
# Sort by descending order
df2 = df.sort_index(ascending=False)
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
340 Java 22000 60days 1200
123 NaN 25000 40days 2500
115 pandas 30000 35days 1500
101 Spark 20000 30days 1000
100 Spark 26000 50days 3000
Reset Index on Sorted Result
Sometimes after sorting you may be required to reset the index starting from zero. In order to reset the index use ignore_index=True
. This ignores the index and creates a new one.
# Sort ignoring index
df2 = df.sort_index(ignore_index=True)
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
0 Spark 26000 50days 3000
1 Spark 20000 30days 1000
2 pandas 30000 35days 1500
3 NaN 25000 40days 2500
4 Java 22000 60days 1200
Sort by Column Names/Labels
By default, sorting happens on index labels, Use axis=1
to change this and sort on columns by name in pandas DataFrame.
# Sort by column names
df2 = df.sort_index(axis=1)
print(df2)
Yields below output.
# Output:
Courses Discount Duration Fee
101 Spark 1000 30days 20000
123 NaN 2500 40days 25000
115 pandas 1500 35days 30000
340 Java 1200 60days 22000
100 Spark 3000 50days 26000
FAQ on Pandas DataFrame.sort_index() – Sort by Index
sort_index()
sorts a DataFrame by its index (row labels). By default, it sorts in ascending order, but you can change the sorting order using parameters.
To sort a DataFrame by its index in Pandas, you can use the sort_index()
method.
You can sort a DataFrame in descending order by its index using the sort_index()
method with the ascending=False
parameter.
By default, sort_index()
returns a new DataFrame and does not modify the original DataFrame. If you want to modify the original DataFrame, you can use inplace=True
.
When the index contains duplicate values, sort_index()
will still work, but the duplicate values will appear in sorted order based on their positions.
If you want to retain the original index after sorting (i.e., not drop the current index and reset it), you can use reset_index()
before sorting.
Conclusion
In this article, you have learned the syntax of the sort_index()
method, sorting rows by index and sorting DataFrame by column names.
Happy Learning !!
Related Articles
- Pandas Merge Multiple DataFrames
- Pandas Groupby Sort within Groups
- Add Column Name to Pandas Series?
- Pandas Groupby Aggregate Explained
- How to Unpivot DataFrame in Pandas?
- Pandas Sort by Column Values DataFrame
- pandas reset_index() – Rest Index on DataFrame
- Pandas Merge DataFrames Explained Examples