Pandas Groupby Sort within Groups

Spread the love

You can find out how to perform groupby and apply sort within groups of Pandas DataFrame by using DataFrame.Sort_values() and DataFrame.groupby()and apply() with lambda functions. In this article, I will explain how to do groupby and apply sort within groups of Pandas DataFrame and also how to get the count of each group and sort by count column.

1. Quick Examples of Sort within Groups of Pandas DataFrame

If you are in hurry below are some quick examples of doing groupby and performing sort within groups of pandas DataFrame.


# Below are some quick examples.

# Example 1 - Using groupby to sort_values of Pandas DataFrame.
df2=df.sort_values(['Courses','Fee'],ascending=False).groupby('Courses').head(3)

# Example 2 
df2 = df.groupby(['Courses','Duration']).agg({'Fee':sum})
# First three elements using groupby with lambda and DataFrame.apply() method.
df2 = df.apply(lambda x: x.sort_values(ascending=False).head(3))

# Example 3 - Using groupby with DataFrame.nlargest().
df2=df.groupby(["Courses"])["Fee"].nlargest(3)

# Example 4 - Sort values in descending order with groupby.
df2=df.groupby(['Courses'])['Fee'].sum().sort_values(ascending=False).head(2)

#  Example 5 - Sort values of groupby using DataFrame.drop() method.
df2=df.groupby(['Fee']).apply(lambda x: x.sort_values(['Courses'], ascending=False).head(3)
.drop('Fee', axis=1))

Let’s create a pandas DataFrame with a few rows and columns, execute these examples and validate results. Our DataFrame contains column names CoursesFee and Duration.


# Create a Pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Spark","Python","PySpark"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Duration':['30days','50days','30days','60days','35days']
          }
df = pd.DataFrame(technologies)
print(df)

Yields below output.


# Output:
   Courses    Fee Duration
0    Spark  22000   30days
1  PySpark  25000   50days
2    Spark  23000   30days
3   Python  24000   60days
4  PySpark  26000   35days

2. Sort within Groups of groupby() Result in DataFrame

By using DataFrame.sort_values(), you can sort DataFrame in ascending or descending order, before you use this first group the DataFrame rows by using DataFrame.groupby() method.

Note that groupby preserves the order of rows within each group.


# Using groupby & sort_values to sort.
df2=df.sort_values(['Courses','Fee'],ascending=False).groupby('Courses').head(3)
print(df2)

Yields below output. head() method or similar should be used to get the result of the DataFrame. Here, head() method return 3 rows for each group.


# Output:
   Courses    Fee Duration
2    Spark  23000   30days
0    Spark  22000   30days
3   Python  24000   60days
4  PySpark  26000   35days
1  PySpark  25000   50days

3. Another Example of Sorting within group

First let’s group the rows and then we apply the sort for each group.


# Groupby using DataFrame.agg() Method.
df2 = df.groupby(['Courses','Duration']).agg({'Fee':sum})
print(df2)

Yields below output.


# Output:
                      Fee
Courses Duration       
PySpark 35days    26000
        50days    25000
Python  60days    24000
Spark   30days    45000

Now, We group by the first level of the index:


# Groupby the first level of index.
df2 = df.agg['Fee'].groupby('Courses', group_keys=False)
print(df2)

Then, If you want to sort each group and take the first three elements by using lambda and pandas.DataFrame.apply() functions.


# First three elements using groupby with lambda and DataFrame.apply() method.
df2 = df.apply(lambda x: x.sort_values(ascending=False).head(3))
print(df2)

Yields below output.


# Output:
  Courses      Fee Duration
0   Spark      NaN      NaN
1     NaN  25000.0   50days
2   Spark      NaN      NaN
3  Python  24000.0   60days
4     NaN  26000.0   35days

4. Using Groupby with DataFrame.nlargest()

The DataFrame.nlargest() function is used to get the first n rows ordered by columns in descending order. The columns that are not specified are returned as well, but not used for ordering.


# Using groupby with DataFrame.nlargest().
df2=df.groupby(["Courses"])["Fee"].nlargest(3)
print(df2)

Yields below output.


# Output:
Courses   
PySpark  4    26000
         1    25000
Python   3    24000
Spark    2    23000
         0    22000
Name: Fee, dtype: int64

5. Sort Values in Descending Order with Groupby

You can sort values in descending order by using ascending=False param to sort_values() method. The head() function is used to get the first n rows. It is useful for quickly testing if your object has the right type of data in it.


# Sort values in descending order with groupby.
df2=df.groupby(['Courses'])['Fee'].sum().sort_values(ascending=False).head(2)
print(df2)

Yields below output.


# Output:
Courses
PySpark    51000
Spark      45000
Name: Fee, dtype: int64

6. Sort Values using apply()

Now lets see how to sort groupby results using apply() method. Here we apply a lamda function with sort_values() function to sort data.


# Sort values of groupby using DataFrame.drop() method.
df2=df.groupby(['Fee']).apply(lambda x: x.sort_values(['Courses'], ascending=False).head(3)
.drop('Fee', axis=1))
print(df2)

Yields below output.


# Output:
Fee                      
22000 0    Spark   30days
23000 2    Spark   30days
24000 3   Python   60days
25000 1  PySpark   50days
26000 4  PySpark   35days

7. Complete Examples of Sort within Groups


# Create a Pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Spark","Python","PySpark"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Duration':['30days','50days','30days','60days','35days']
          }
df = pd.DataFrame(technologies)
print(df)

# Using groupby to sort_values of Pandas DataFrame.
df2=df.sort_values(['Courses','Fee'],ascending=False).groupby('Courses').head(3)
print(df2)

# Groupby using DataFrame.agg() Method.
df2 = df.groupby(['Courses','Duration']).agg({'Fee':sum})
print(df2)

# First three elements using groupby with lambda and DataFrame.apply() method.
df2 = df.apply(lambda x: x.sort_values(ascending=False).head(3))
print(df2)

# Using groupby with DataFrame.nlargest().
df2=df.groupby(["Courses"])["Fee"].nlargest(3)
print(df2)

# Sort values in descending order with groupby.
df2=df.groupby(['Courses'])['Fee'].sum().sort_values(ascending=False).head(2)
print(df2)

# Sort values of groupby using DataFrame.drop() method.
df2=df.groupby(['Fee']).apply(lambda x: x.sort_values(['Courses'], ascending=False).head(3)
.drop('Fee', axis=1))
print(df2)

Conclusion

In this article, You have learned how to sort values within the group after groupby using Pandas DataFrame.groupby(), DataFrame.Sort_values() Methods and lambda functions with multiple examples.

References

Leave a Reply

You are currently viewing Pandas Groupby Sort within Groups