Pandas Groupby Aggregate Explained

  • Post author:
  • Post category:Pandas
  • Post last modified:December 13, 2023
  • Reading time:19 mins read

Explain how to perform groupby aggregate (agg) in Pandas? Aggregation is important in Data Science and can provide answers to analytical questions. In statistical analysis, data aggregation is used to provide useful insights for groups of data. It helps create useful and insightful data stories and find trends.

1. What is Groupby Aggregation in Pandas?

In general, data aggregation is the combination of related groups or categories to provide insightful information. A good example will be calculating the position of a student in a row based on the percentage of marks of that student in different classes.

In Pandas, the aggregate() or agg() functions are used to apply the aggregation on groupby objects. For the aggregate() function to be applied, we first need to create the object of the Pandas GroupBy class. Once we have the grouped data we can apply aggregation functions to it.

2. Example of Groupby Aggregation in Pandas

To better understand, let’s have an example of the GroupBy aggregation in Pandas. We will be using stock stock-related dataset. You can download the dataset file from here.

In the following example, let’s see how we can find out the pattern of the Low and High prices. Our Example has a very limited dataset but still, you can apply this code to any dataset to find out the aggregations.


import pandas as pd
import numpy as np

# Example 1: reading the stock data file
df = pd.read_csv('stockdata.csv')


# Example 2: creating GroupBy object
group_obj = df.groupby('High')


# Example 3: applying aggregation function 
t_df = group_obj.aggregate({'Low':np.mean})
t_df.head()

Yields the following Output

pandas groupby aggregate
See how we aggregated the GropBy Values

3. Pandas GroupBy.aggregate() Syntax

The syntax of the pandas GroupBy().aggregate() along with different parameters is the following.


DataFrameGroupBy.aggregate(func=None, *args, engine=None, engine_kwargs=None, **kwargs)

The same syntax can be applied to not only Pandas DataFrame but also to the Pandas Series object as well. Following is the syntax of the pandas GroupBy.aggregate() function for the Pandas Series.


SeriesGroupBy.aggregate(func=None, *args, engine=None, engine_kwargs=None, **kwargs)

We can apply the aggregate function to both DataFrame and series in the same way. The only difference will be the return value, for the Pandas Series GroupBy, it will return the Series object and for the DataFrame it will return the DataFrame Object.

Related : Pandas groupby() and sum() With Examples

4. Use pandas.GroupBy.aggregate() for min and max Value

In the following examples, Let’s say, we want to find the Minimum and Maximum Low values for the corresponding “High” column value. We can find out by using pandas.GroupBy.aggregate(). First, we need to use the groupBy() function to make the grouped data object.

  • In this example, we will find the Maximum “Low” column value for the corresponding “High” column

import pandas as pd

# reading the stock data file
df = pd.read_csv('stockdata.csv')

# creating GroupBy object
group_obj = df.groupby('High')

# applying the max aggregation function 
t_df = group_obj.aggregate('max')
t_df.head(6)

Yields, the following output.

pandas groupby aggregation
  • In the following example, we will see the minimum value of the “Low” column corresponding to the “High” column.

import pandas as pd

# reading the stock data file
df = pd.read_csv('stockdata.csv')

# creating GroupBy object
group_obj = df.groupby('High')

# applying the min aggregation function 
t_df = group_obj.aggregate('min')
t_df.head(6)

Yields the below output.

5. Create Columns for Aggregation for a Selected Column

With pandas GroupBy.Aggregate() function, we can also create columns for each aggregation function. For example, we want to do analysis on “Low” Price column on the basis of “High” Column value. We can find out the different parameters, you can see it in the following example.


import pandas as pd

# reading the stock data file
df = pd.read_csv('stockdata.csv')

# creating GroupBy object
group_obj = df.groupby('High')

# applying these aggregation functions
t_df = group_obj.Low.agg(['min', 'max','mean', 'median','sum'])
t_df.head(6)

Yields the following output.

6. User-Defined Function for Pandas GroupBy Aggregation

Though there is a lot of aggregation built-in to pandas, pandas give us the choice to create our own function for finding aggregate. You can create either an anonymous lambda function or a normal python function. We will see the example of both methods.


import pandas as pd

# reading the stock data file
df = pd.read_csv('stockdata.csv')

# function that find sum of row
def sub_t(value):
    return sum(value)

# creating GroupBy object
group_obj = df.groupby('High')

# applying the 'sub_t'
t_df = group_obj.agg(sub_t)
t_df.head(6)

Yields the following output.

pandas groupby agg
  • Using lambda function in aggregate() function

This is the recommended way of using a custom function. The other version which we seen earlier will be deprecated in the newer version of pandas.


import pandas as pd

# reading the stock data file
df = pd.read_csv('stockdata.csv')

# creating GroupBy object
group_obj = df.groupby('High')
# applying the lambda function
t_df = group_obj.agg(lambda x: sum(x) / x.index)
t_df.head(6)

Yields the following output.

7. Pandas GroupBy.aggregate() on Series Vs DataFrame

The Pandas Groupby aggregate operates the same way for both DataFrame and Pandas Series. The only difference is the return value. If the aggregation is applied on a DataFrame the return value will be a DataFrame otherwise if applied on a Pandas Series then the return value will be a pandas Series.

  • Pandas Groupby Aggregation on Pandas Series Example

import pandas as pd

# reading the stock data file
df = pd.read_csv('stockdata.csv')
# convert dataframe to series
ser = df.squeeze()

# creating GroupBy object
group_obj = ser.groupby('High')
# applying the lambda function
t_ser = group_obj.agg('std')
t_ser.head(6)

Yields the following output:

8. Named Aggregation in Pandas

You might have observed in the previous examples, where we have applied aggregation functions, and the name of those functions are assigned as default name to the column. Let’s change the column name of the aggregation function.


import pandas as pd
# reading the stock data file
df = pd.read_csv('stockdata.csv')

# creating GroupBy object
group_obj = df.groupby('High')

# applying lambda aggregation funciton
t_df = group_obj.agg(
    min_Low=pd.NamedAgg(column="Low", aggfunc="min"),
    max_Low=pd.NamedAgg(column="Low", aggfunc="max"),)
t_df.head(6)

Yields the below output where you see the column names change to the custom name provided.

pandas groupby aggregate

Related: Pandas Rename Column with Examples

9. GroupBy.aggregate() Function to Skip cells having NaN value

By default, the pandas aggregate will transform the NaN value to 0, which in some cases you don’t want. For this purpose we can use the lambda function, to specify that we want to retain the NaN value in our DataFrame. See the following example.


import pandas as pd
import numpy as np
# reading the stock data file
df = pd.read_csv('stockdata.csv')

# creating GroupBy object
group_obj = df.groupby('High')

# applying the aggregation function
t_df = group_obj.agg(lambda x: sum(x) if any(x) else np.nan).reset_index()
t_df.head(6)

Frequently Asked Questions of Pandas Groupby Aggregate

What is the purpose of the groupby() function in Pandas?

The groupby() function in Pandas is used to split the data into groups based on one or more columns. It is often followed by an aggregation function to perform operations on each group separately.

How do I use the groupby() function?

You can use the groupby() function by specifying the single/multiple columns you want to group by. For example, grouped_data = df.groupby(‘column_name’)

What does the aggregate() function do in Pandas?

The aggregate (or agg) function in Pandas is used to apply one or more aggregation operations to the grouped data. It allows you to compute various statistics for each group. For example, grouped_data.agg('mean')

How can I apply multiple aggregation functions simultaneously?

You can pass a list of aggregation functions to the agg() method and perform multiple aggregation functions on grouped data. For example, grouped_data.agg(['mean', 'sum'])

How do I perform different aggregations for different columns?

You can use a dictionary with column names as keys and aggregation functions as values to perform different aggregations for different columns. For example, grouped_data.agg({'column1': 'mean', 'column2': 'sum'})

10. Summary and Conclusion

In this article, you have learned about different ways to perform groupby aggregate (agg) in pandas. Hope this makes sense by now. Please let us know if you want to add something or just have a view about the article.

AlixaProDev

I am an astute software engineer with extensive 3+ years of experience in developing full-stack web applications. My skillets include building backend services for different databases. Mainly I work with Python, Flask, Django, and FastAPIs. Being Python Specialist, I have worked with Numpy, Pandas, and other Python Libraries to build tools that make ETL normalization easier.

Leave a Reply