• Post author:
  • Post category:Pandas
  • Post last modified:December 12, 2024
  • Reading time:14 mins read
You are currently viewing How to Sort Multiple Columns in Pandas DataFrame

You can sort a Pandas DataFrame by one or more columns using the sort_values() method, either in ascending or descending order. To specify the sort order, use the ascending parameter, which accepts boolean values: False for descending and True for ascending. By default, it is set to True.

Advertisements

In this article, I will explain how to sort pandas DataFrame with one or more columns. By default sort_values() returns a copy DataFrame with the result of the sort. To sort on current DataFrame use inplace=True.

Key Points –

  • Use the sort_values() function to sort a DataFrame by one or more columns.
  • Pass a list of column names to the by parameter for sorting multiple columns.
  • Control the sorting order for each column by passing a list of boolean values to the ascending parameter (True for ascending, False for descending).
  • The axis parameter is used to specify whether to sort rows (axis=0) or columns (axis=1).
  • Use inplace=True if you want to apply the sort directly to the original DataFrame without creating a new one.
  • Handle missing (NaN) values by specifying the na_position parameter ('first' to place them at the beginning, 'last' to place them at the end).

Quick Examples of Sort by Multiple Columns in Pandas

Following are quick examples of sorting by multiple columns.


# Quick examples of sort by multiple columns

# Example 1: Sort multiple columns
df2 = df.sort_values(['Fee', 'Duration'],
              ascending = [False, True])

# Example 2: Sort by two columns 
df2 = df.sort_values(['Courses', 'Discount'],
              ascending = [True, True])

# Example 3: Using the sorting function
df.sort_values(["Fee", "Courses"],
               axis = 0, ascending = True,
               inplace = True,
               na_position = "first")

To run some examples of sorting multiple columns in Pandas DataFrame, let’s create Pandas DataFrame.


# Create DataFrame
import pandas as pd
technologies = ({
    'Courses':["Spark","Hadoop","pandas","Oracle","Java"],
    'Fee' :[20000,25000,26000,22000,20000],
    'Duration':['30days','35days','40days','50days','60days'],
    'Discount':[1000,2300,1500,1200,2500]
               })
df = pd.DataFrame(technologies, index = ['r1','r2','r3','r4','r5'])
print("Create DataFrame:\n", df)

Yields below output.

Pandas Sort multiple Columns

Sort Multiple Columns in Pandas DataFrame

By using the sort_values() method you can sort single column or multiple columns in DataFrame by ascending or descending order. When order is not specified, all specified columns are sorted in ascending order.


# Sort multiple columns
df2 = df.sort_values(['Fee', 'Discount'])
print("Get the DataFrame after sorting:\n", df2)

Yields below output.

Pandas Sort multiple Columns

In case, if you want to update the existing DataFrame use inplace=True. This function is used with the by parameter, which takes a list of column names you want to sort.


# Sort ascending order
df.sort_values(by=['Fee','Discount'], inplace=True)
print("Get the DataFrame after sorting:\n", df)

Yields the same output as above.

Sort in an Ascending Order

Use the ascending parameter to arrange the DataFrame either in ascending or descending order. When dealing with multiple sorting columns, the default behavior is to sort in ascending order.


# Sort ascending order
df.sort_values(by=['Fee','Discount'], inplace=True,
               ascending = [True, True])
print("Get the DataFrame after sorting:\n", df)

Yields below output.


# Output:
# Get the DataFrame after sorting:
   Courses    Fee Duration  Discount
r1   Spark  20000   30days      1000
r5    Java  20000   60days      2500
r4  Oracle  22000   50days      1200
r2  Hadoop  25000   35days      2300
r3  pandas  26000   40days      1500

Sort Multiple Columns in Descending Order

If you want to sort in descending order, you can use the ascending parameter and set it to False. You also have the option to specify distinct sorting orders for each input.


# Sort descending order
df.sort_values(by=['Fee','Discount'], inplace=True,
               ascending = [True, False])
print("Get the DataFrame after sorting:\n", df)

Yields below output.


# Output:
   Courses    Fee Duration  Discount
r0    Java  20000   60days      2500
r1   Spark  20000   30days      1000
r4  Oracle  22000   50days      1200
r2  Hadoop  25000   35days      2300
r3  pandas  26000   40days      1500

FAQ on How to Sort Multiple Columns in Pandas DataFrame

How do I sort a DataFrame by multiple columns?

You can use the sort_values() function in Pandas and pass a list of column names to the by parameter.

Can I sort columns in both ascending and descending order?

You can specify the sorting order for each column individually by passing a list of boolean values to the ascending parameter.

How do I handle missing values while sorting?

By default, Pandas places missing values (NaN) at the end when sorting in ascending order and at the beginning for descending order. You can change this behavior by using the na_position parameter

Can I sort a DataFrame in place without creating a new one?

You can sort a DataFrame in place without creating a new one by using the inplace=True argument with the sort_values() method. This modifies the original DataFrame directly.

How do I sort by multiple columns with numeric and string data types?

Pandas can sort columns with mixed data types (numeric, string) without issues. Ensure that the numeric columns are not mixed with non-numeric data types for proper sorting.

How can I sort based on multiple columns with different levels of granularity?

If you have a multi-level index (MultiIndex), you can sort by the index levels or columns. Use sort_index() for index levels or sort_values() for column-based sorting.

Conclusion

In this article, I have explained sorting a DataFrame by multiple columns using Dataframe.sort_values() in ascending or descending order.

Happy Learning !!

Reference