How to groupby multiple columns in pandas DataFrame and compute multiple aggregations? groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.
1. Quick Examples of GroupBy Multiple Columns
Following are examples of how to groupby on multiple columns & apply multiple aggregations.
# Quick Examples
# Groupby multiple columns
result = df.groupby(['Courses','Fee']).count()
print(result)
# Groupby multiple columns and aggregate on selected column
result = df.groupby(['Courses','Fee'])['Courses'].count()
print(result)
# Groupby multiple columns and aggregate()
result = df.groupby(['Courses','Fee'])['Duration'].aggregate('count')
print(result)
# Groupby multiple aggregations
result = df.groupby('Courses')['Fee'].aggregate(['min','max'])
print(result)
# Groupby & multiple aggregations on different columns
result = df.groupby('Courses').aggregate({'Duration':'count','Fee':['min','max']})
print(result)
2. Pandas GroupBy Multiple Columns Example
Most of the time when you are working on a real-time project in Pandas DataFrame you are required to do groupby on multiple columns. You can do so by passing a list of column names to DataFrame.groupby() function. Let’s create a DataFrame to understand this with examples.
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Hadoop","Python","PySpark","Spark","Spark"],
'Fee' :[20000,25000,26000,22000,25000,20000,35000],
'Duration':['30day','40days','35days','40days','60days','60days','70days'],
'Discount':[1000,2300,1200,2500,2000,2000,3000]
}
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)
Yields below output.
Now let’s do a group on multiple columns and then calculate count aggregation.
# Groupby multiple columns
result = df.groupby(['Courses','Fee']).count()
print("After grouping by multiple columns:\n", result)
Yields below output. When you apply count on the entire DataFrame, pretty much all columns will have the same values.
So when you want to group by count just select a column, you can even select from your group columns.
# Group by multiple columns and get
# count of one of grouping column
result = df.groupby(['Courses','Fee'])['Courses'].count(\n", result)
print("Get count of one of the grouping column:\n", result)
# Output:
# Get count of one of the grouping column:
# Courses Fee
# Hadoop 26000 1
# PySpark 25000 2
# Python 22000 1
# Spark 20000 2
# 35000 1
# Name: Courses, dtype: int6
2. Using aggregate()
Alternatively, you can also use the aggregate() function. This takes the count
function as a string param.
# Groupby multiple columns and aggregate()
result = df.groupby(['Courses','Fee'])['Courses'].aggregate('count')
print("After grouping by multiple columns:\n", result)
Yields below output.
# Output:
After grouping by multiple columns:
Courses Fee
Hadoop 26000 1
PySpark 25000 2
Python 22000 1
Spark 20000 2
35000 1
Name: Duration, dtype: int64
3. Pandas Multiple Aggregations Example
You can also compute multiple aggregations at the same time in Pandas by using the list to the aggregate()
.
# Groupby & multiple aggregations
result = df.groupby('Courses')['Fee'].aggregate(['min','max'])
print("After applying multiple aggregations on multiple group columns:\n", result)
Yields below output.
# Output:
# After applying multiple aggregations on multiple group columns:
min max
Courses
Hadoop 26000 26000
PySpark 25000 25000
Python 22000 22000
Spark 20000 35000
The above example calculates min
and max
on the Fee
column. Let’s extend this to compute different aggregations on different columns.
Note that applying multiple aggregations to a single column in pandas DataFrame will result in a MultiIndex.
# Groupby multiple columns & multiple aggregations
result = df.groupby('Courses').aggregate({'Duration':'count','Fee':['min','max']})
print("After applying multiple aggregations on single group column:\n", result)
Yields below output. Notice that this creates MultiIndex. Working with multi-indexed columns is not easy so I’d recommend flattening by renaming the columns.
# Output:
# After applying multiple aggregations on single group column:
Duration Fee
count min max
Courses
Hadoop 1 26000 26000
PySpark 2 25000 25000
Python 1 22000 22000
Spark 3 20000 35000
Conclusion
In this article, you have learned how to group DataFrame rows by multiple columns and also learned how to compute different aggregations on a column.
Related Articles
- Pandas groupby Aggregate Explained
- Pandas groupby Transform
- Pandas groupby Sort within Groups
- Pandas groupby() and count() with Examples
- Pandas groupby() Explained With Examples
- Pandas groupby() and sum() With Examples
- Pandas Group Rows into List Using groupby()
- How to groupby Index in Pandas?
- Pandas Percentage Total With groupby
- Pandas Check Column Contains a Value in DataFrame
- Extract Pandas column value based on another column