You can find out the sorting within each group of Pandas DataFrame by using DataFrame.Sort_values()
and the apply()
function along with the lambda function. In this article, I will explain how to sort the data within each group using sort_values()
and apply()
functions and also explain how to get the count of each group and sort by count column.
Key Points –
- The
groupby
function in Pandas is used to group data based on one or more columns, facilitating group-based analysis and transformations. - After grouping data using
groupby
, you can sort values within each group to order data based on specified columns, facilitating analysis of highest or lowest values in each group. - Sorting within groups can be achieved by chaining the
sort_values()
method to the grouped DataFrame, specifying the column(s) to sort by. - To sort data within each group, you can use methods like
sort_values()
ornlargest()
, allowing you to order the data inside each group based on specific criteria. - Sorting can be applied after performing aggregation functions like
sum()
,mean()
, orcount()
within groups to rank groups by their aggregate values. - You can specify multiple columns for sorting within groups by passing a list to the
sort_values()
method, providing multi-level sorting.
Quick Examples of Sort within Groups of Pandas DataFrame
If you are in a hurry below are some quick examples of doing groupby and performing sort within groups of pandas DataFrame.
# Quick examples of sort within groups of pandas dataframe
# Example 1 - Using groupby to sort_values of Pandas DataFrame
df2=df.sort_values(['Courses','Fee'],ascending=False).groupby('Courses').head(3)
# Example 2 - First three elements
# Using groupby with lambda and DataFrame.apply() method
df2 = df.groupby(['Courses','Duration']).agg({'Fee':sum})
df2 = df.apply(lambda x: x.sort_values(ascending=False).head(3))
# Example 3 - Using groupby with DataFrame.nlargest()
df2=df.groupby(["Courses"])["Fee"].nlargest(3)
# Example 4 - Sort values in descending order with groupby
df2=df.groupby(['Courses'])['Fee'].sum().sort_values(ascending=False).head(2)
# Example 5 - Sort values of groupby
# Using DataFrame.drop() method
df2=df.groupby(['Fee']).apply(lambda x: x.sort_values(['Courses'], ascending=False).head(3)
.drop('Fee', axis=1))
Let’s create a pandas DataFrame with a few rows and columns, execute these examples, and validate the results. Our DataFrame contains column names Courses
, Fee
and Duration
.
# Create a Pandas DataFrame
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Spark","Python","PySpark"],
'Fee' :[22000,25000,23000,24000,26000],
'Duration':['30days','50days','30days','60days','35days']
}
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)
Yields below output.
Sort within Each Group of Pandas DataFrame
By using DataFrame.sort_values()
, you can sort DataFrame in ascending or descending order, before going to sort the grouped data, we need to group the DataFrame rows by using DataFrame.groupby() method.
Note that groupby preserves the order of rows within each group.
# Using groupby & sort_values to sort.
df2=df.sort_values(['Courses','Fee'], ascending=False).groupby('Courses').head(3)
print("After sorting the data within each group:\n", df2)
Yields below output. head() method or similar should be used to get the result of the DataFrame. Here, head() method returns 3 rows for each group.
Another Example of Sorting within Group
First, let’s group the rows using the groupby()
function and then perform sorting for each group.
# Groupby using DataFrame.agg() method
df2 = df.groupby(['Courses','Duration']).agg({'Fee':sum})
print("After sorting the data within each group:\n", df2)
Yields below output.
# Output:
# After sorting the data within each group:
Fee
Courses Duration
PySpark 35days 26000
50days 25000
Python 60days 24000
Spark 30days 45000
Now, We group by the first level of the index:
# Groupby the first level of index
df2 = df.agg['Fee'].groupby('Courses', group_keys=False)
print(df2)
Then, If you want to sort each group first, take the first three elements by using lambda along with pandas.DataFrame.apply() functions.
# First three elements using groupby with lambda and DataFrame.apply() method.
df2 = df.apply(lambda x: x.sort_values(ascending=False).head(3))
print(df2)
Yields below output.
# Output:
Courses Fee Duration
0 Spark NaN NaN
1 NaN 25000.0 50days
2 Spark NaN NaN
3 Python 24000.0 60days
4 NaN 26000.0 35days
Using Groupby with DataFrame.nlargest()
The DataFrame.nlargest()
function is used to get the first n rows ordered by columns in descending order. The columns that are not specified are returned as well, but not used for ordering.
# Using groupby with DataFrame.nlargest()
df2=df.groupby(["Courses"])["Fee"].nlargest(3)
print(df2)
Yields below output.
# Output:
Courses
PySpark 4 26000
1 25000
Python 3 24000
Spark 2 23000
0 22000
Name: Fee, dtype: int64
Sort Values in Descending Order with Groupby
You can sort values in descending order by using the ascending=False param to sort_values() method. The head()
function is used to get the first n rows. It is useful for quickly testing if your object has the right type of data in it.
# Sort values in descending order with groupby
df2=df.groupby(['Courses'])['Fee'].sum().sort_values(ascending=False).head(2)
print(df2)
Yields below output.
# Output:
Courses
PySpark 51000
Spark 45000
Name: Fee, dtype: int64
Sort Values Using apply()
Now let’s see how to sort groupby results using the apply() method. Here we apply a lambda function with the sort_values() function to sort data.
# Sort values of groupby
# Using DataFrame.drop() method
df2=df.groupby(['Fee']).apply(lambda x: x.sort_values(['Courses'], ascending=False).head(3)
.drop('Fee', axis=1))
print(df2)
Yields below output.
# Output:
Fee
22000 0 Spark 30days
23000 2 Spark 30days
24000 3 Python 60days
25000 1 PySpark 50days
26000 4 PySpark 35days
Complete Examples of Sort within Groups
# Create a Pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Spark","Python","PySpark"],
'Fee' :[22000,25000,23000,24000,26000],
'Duration':['30days','50days','30days','60days','35days']
}
df = pd.DataFrame(technologies)
print(df)
# Using groupby to sort_values of Pandas DataFrame
df2=df.sort_values(['Courses','Fee'],ascending=False).groupby('Courses').head(3)
print(df2)
# Groupby using DataFrame.agg() method
df2 = df.groupby(['Courses','Duration']).agg({'Fee':sum})
print(df2)
# First three elements
# Using groupby with lambda and DataFrame.apply() method
df2 = df.apply(lambda x: x.sort_values(ascending=False).head(3))
print(df2)
# Using groupby with DataFrame.nlargest()
df2=df.groupby(["Courses"])["Fee"].nlargest(3)
print(df2)
# Sort values in descending order with groupby
df2=df.groupby(['Courses'])['Fee'].sum().sort_values(ascending=False).head(2)
print(df2)
# Sort values of groupby using DataFrame.drop() method
df2=df.groupby(['Fee']).apply(lambda x: x.sort_values(['Courses'], ascending=False).head(3)
.drop('Fee', axis=1))
print(df2)
FAQ on Pandas Groupby Sort within Groups
Sorting within groups allows you to organize the data in each group (created by groupby
) based on specific column(s), which is particularly useful for analysis, ranking, or further computation.
You can use the groupby
method along with apply()
or transform()
to sort values within each group.
You can sort within groups in descending order in Pandas. To do this, use the ascending=False
argument in the sort_values()
function inside a groupby.apply()
operation.
To sort within groups based on multiple columns in Pandas, you can use the groupby
method with apply()
and pass a list of columns to the sort_values()
function. This approach allows you to specify the sort order for each column independently.
You can sort within groups without using apply()
by leveraging sort_values()
directly on the DataFrame. The idea is to first sort by the grouping column(s) and then by the columns within each group that you want to sort. This approach is efficient and avoids the overhead of apply()
.
Conclusion
In this article, You have learned how to sort values within each group after groupby using Pandas DataFrame.groupby()
, DataFrame.Sort_values()
, and apply()
with lambda
functions with multiple examples.
Related Articles
- Pandas Groupby Transform
- Pandas Groupby Aggregate Explained
- Rename Specific Columns in Pandas
- How to GroupBy Index in Pandas?
- Pandas Percentage Total With Groupby
- Pandas DataFrame count() Function
- Replace Column value in Pandas DataFrame
- Pandas Find Unique Values From Columns
- Pandas groupby() and count() with Examples
- Pandas groupby() and sum() With Examples
- Pandas Sort by Column Values DataFrame
- Sort Pandas DataFrame by Date (Datetime)
- Pandas Group Rows into List Using groupby()
- Pandas Series.sort_values() With Examples
- How to Sort Multiple Columns in Pandas DataFrame
- Convert groupby() output from series to DatatFrame
- Retrieve Number of Rows From Pandas DataFrame
- Convert Row to Column Header in Pandas DataFrame