How to groupby multiple columns in pandas DataFrame and compute multiple aggregations? groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.
1. Quick Examples of GroupBy Multiple Columns
Following are examples of how to groupby on multiple columns & apply multiple aggregations.
# Quick Examples
# Groupby multiple columns
result = df.groupby(['Courses','Fee']).count()
print(result)
# Groupby multiple columns and aggregate on selected column
result = df.groupby(['Courses','Fee'])['Courses'].count()
print(result)
# Groupby multiple columns and aggregate()
result = df.groupby(['Courses','Fee'])['Duration'].aggregate('count')
print(result)
# Groupby multiple aggregations
result = df.groupby('Courses')['Fee'].aggregate(['min','max'])
print(result)
# Groupby & multiple aggregations on different columns
result = df.groupby('Courses').aggregate({'Duration':'count','Fee':['min','max']})
print(result)
2. pandas GroupBy Multiple Columns Example
Most of the time when you are working on a real-time project in pandas DataFrame you are required to do groupby on multiple columns. You can do so by passing a list of column names to DataFrame.groupby() function. Let’s create a DataFrame to understand this with examples.
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Hadoop","Python","PySpark","Spark","Spark"],
'Fee' :[20000,25000,26000,22000,25000,20000,35000],
'Duration':['30day','40days','35days','40days','60days','60days','70days'],
'Discount':[1000,2300,1200,2500,2000,2000,3000]
}
df = pd.DataFrame(technologies)
print(df)
Yields below output.
# Output:
Courses Fee Duration Discount
0 Spark 20000 30day 1000
1 PySpark 25000 40days 2300
2 Hadoop 26000 35days 1200
3 Python 22000 40days 2500
4 PySpark 25000 60days 2000
5 Spark 20000 60days 2000
6 Spark 35000 70days 3000
Now let’s do a group on multiple columns and then calculate count aggregation.
# Groupby multiple columns
result = df.groupby(['Courses','Fee']).count()
print(result)
Yields below output. When you apply count on the entire DataFrame, pretty much all columns will have the same values.
# Output:
Duration Discount
Courses Fee
Hadoop 26000 1 1
PySpark 25000 2 2
Python 22000 1 1
Spark 20000 2 2
35000 1 1
So when you want group by count just select a column, you can event select from your group columns.
# Groupby multiple columns
result = df.groupby(['Courses','Fee'])['Courses'].count()
print(result)
2. Using aggregate()
Alternatively, you can also use the aggregate() function. This takes the count
function as a string param.
# Groupby multiple columns and aggregate()
result = df.groupby(['Courses','Fee'])['Courses'].aggregate('count')
print(result)
Yields below output.
# Output:
Courses Fee
Hadoop 26000 1
PySpark 25000 2
Python 22000 1
Spark 20000 2
35000 1
Name: Duration, dtype: int64
3. pandas Multiple Aggregations Example
You can also compute multiple aggregations at the same time in pandas by using the list to the aggregate()
.
# Groupby & multiple aggregations
result = df.groupby('Courses')['Fee'].aggregate(['min','max'])
print(result)
Yields below output.
# Output:
min max
Courses
Hadoop 26000 26000
PySpark 25000 25000
Python 22000 22000
Spark 20000 35000
The above example calculates min
and max
on the Fee
column. Let’s extend this to compute different aggregations on different columns.
Note that applying multiple aggregations to a single column in pandas DataFrame will result in a MultiIndex.
# Groupby multiple columns & multiple aggregations
result = df.groupby('Courses').aggregate({'Duration':'count','Fee':['min','max']})
print(result)
Yields below output. Notice that this creates MultiIndex. Working with multi-indexed columns is not easy so I’d recommend flattening by renaming the columns.
# Output:
Duration Fee
count min max
Courses
Hadoop 1 26000 26000
PySpark 2 25000 25000
Python 1 22000 22000
Spark 3 20000 35000
Conclusion
In this article, you have learned how to group DataFrame rows by multiple columns and also learned how to compute different aggregations on a column.
Related Articles
- Pandas Groupby Aggregate Explained
- Pandas Groupby Transform
- Pandas Groupby Sort within Groups
- Pandas groupby() and count() with Examples
- Pandas groupby() Explained With Examples
- Pandas groupby() and sum() With Examples
- Pandas Group Rows into List Using groupby()
- How to GroupBy Index in Pandas?
- Pandas Percentage Total With Groupby