• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:20 mins read
You are currently viewing Pandas Series groupby() Function with Examples

The groupby() function in the Pandas Series is a powerful tool for grouping data based on certain criteria. The groupby operation is used to split a DataFrame into groups based on some criteria, and then apply a function to each group independently. When you’re working with a Series, you can still use groupby similarly.

Advertisements

You can group the Pandas Series and calculate various operations on grouped data in many ways, for example, by using groupby() including sum(), mean(), count(), min(), and max() functions. In this article, I will explain the Pandas Series groupby() function and using its syntax, parameters, and usage how we can group the data in the series with multiple examples.

Key Points –

  • Pandas Series groupby() is used for grouping data based on a specified criterion, allowing you to analyze and manipulate subsets of the data independently.
  • The groupby() operation follows the split-apply-combine paradigm. It splits the data into groups, applies a function to each group, and then combines the results into a new data structure.
  • The primary use of groupby() is for aggregation, where you can calculate summary statistics (e.g., sum, mean, count) for each group. Additionally, it supports transformations, allowing you to modify the data within each group.
  • groupby() is valuable for analyzing categorical data, enabling insights into patterns and trends within different categories or levels of a variable.
  • The groups created by groupby() often serve as an index or label for the results. The as_index parameter allows you to control whether the group labels become part of the index in the output.

Syntax of Series groupby()

Following is the syntax of Series groupby()


# Syntax of series groupby()
Series.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=<object object>, observed=False, **kwargs)

Parameters of Series groupby()

Following are the parameters of the Series groupby() function.

  • by – This parameter specifies the grouping information. It can be a single label or a list of labels referring to the axis items to be used for grouping. It could also be a function, Series, or dictionary mapping index to group.
  • axis – The axis along which to group. The default is 0 (rows). You can use 1 for columns.
  • level – For a MultiIndex, level to use for grouping.
  • as_index – If True, the group labels will be used as index names. If False, the resulting object will have an index of integers.
  • sort – Sort the group keys. The default is True.
  • group_keys – When calling apply, add group keys to the index to identify pieces.
  • squeeze – Reduce the dimensionality of the return type if possible. If True, return a Series if there is one group.
  • observed – This is only relevant for categorical data and determines whether to use all categories or only observed categories.
  • **kwargs – Additional keyword arguments are passed to the groupby function.

Return Value

It returns a GroupBy object. This object is an intermediate data structure that represents a mapping of keys to corresponding groups. The actual computation or transformation is often performed after the groupby() operation.

Let’s create a Pandas Series with a customized index and column.


# Imports pandas
import pandas as pd
import numpy as np

# Create a sample Series
data = {'Courses': ["Spark","Python","Spark","Pandas","Python","Pandas"],
        'Fee': [22000,25000,23000,24000,26000,30000]}
ser = pd.Series(data['Fee'], index=data['Courses'])
print("Pandas Series:\n", ser)

Yields below output.

pandas series groupby()

Group by Pandas Series Unique Values and Calculate the Sum

If you want to group a pandas Series by its unique values and calculate the sum for each group, you can use the groupby() function along with an aggregation function like sum().


# Imports pandas
import pandas as pd
import numpy as np
# Group by unique values and calculate the sum
grouped_series = ser.groupby(ser.index).sum()
print("Get the sum of grouped data:\n",grouped_series)

# Group by the 'Courses' and calculate the sum for each group
grouped_series = ser.groupby(level=0).sum()
print("Get the sum of grouped data:\n",grouped_series)

In the above example, the groupby(ser.index) groups the Series by its unique values in the Courses column. Then, the sum() function is applied to calculate the sum of fees for each unique course. The result is displayed as a new Series, where the index represents unique course names, and values represent the corresponding sums of fees. This example yields the below output.

pandas series groupby

Group by Custom Categories and Calculate the Max

Alternatively, you can use the groupby() function along with the max() aggregation function to group by custom categories and calculate the maximum value for each group. Create the custom category using a dictionary and pass it into the groupby() function. It will return the groupby object. After getting the groupby object use max() function to get the maximum values of grouped data.


# Define custom categories
custom_categories = {'Spark': 'Programming', 'Python': 'Programming', 'Pandas': 'Data Analysis'}

# Group by custom categories and calculate the max
grouped_series = ser.groupby(custom_categories).max()
print("Get the maximum value of grouped data:\n", grouped_series)

# Output:
# Get the maximum value of grouped data:
#  Data Analysis    30000
# Programming      26000
# dtype: int64

In the above example, custom_categories is a dictionary that maps each course in the Courses index to a custom category. The groupby(custom_categories) groups the given Series based on these custom categories and then max() is applied to calculate the maximum fee for each category.

Group by String Length and Count Occurrences

Similarly, to group by the length of strings in a pandas Series and count the occurrences of each string length, you can use the groupby() function along with the str.len() method and count() aggregation.


# Imports pandas
import pandas as pd

# Create a sample Series
data = {'Courses': ["Spark", "Python", "Java", "Pandas", "C", "R"]}
ser = pd.Series(data['Courses'])
print("Pandas Series:\n", ser)

# Group by string length and count occurrences
grouped_series = ser.groupby(ser.str.len()).count()
print("Group by string length and count occurrences:\n", grouped_series)

# Output:
# Pandas Series:
#  0     Spark
# 1    Python
# 2      Java
# 3    Pandas
# 4         C
# 5         R
# dtype: object
# Group by string length and count occurrences:
# 1    2
# 4    1
# 5    1
# 6    2
# dtype: int64

In the above example, ser.str.len() is used to get the length of each string in the Series. The groupby(ser.str.len()) groups the Series based on these string lengths, and then count() is applied to calculate the occurrences for each string length.

Group by Even or Odd Values and Calculate the Mean

To group a pandas Series by whether its values are even or odd, and then calculate the mean for each group, you can use the groupby() function along with a custom grouping function and the mean() aggregation.


# Imports pandas
import pandas as pd

# Create a sample Series
ser = pd.Series([1, 2, 3, 4, 5, 6])

# Group by even or odd values and calculate the mean
grouped_series = ser.groupby(ser % 2).mean()
print("Group by even or odd values and calculate the mean:\n", grouped_series)

# Output:
# Group by even or odd values and calculate the mean:
#  0    4
# 1    3
# dtype: int64

Here,

  • The ser % 2 creates groups based on whether each value in the Series is even (group 0) or odd (group 1).
  • The groupby() function is used to group the Series based on these groups.
  • Finally, the mean() function is applied to calculate the mean for each group.

Group by Custom Function and Calculate the Mean

Similarly, You can also group a pandas Series by a custom function and then calculate the mean for each group, you can use the groupby() function along with the custom function and mean() aggregation. For example,


# Imports pandas
import pandas as pd

# Create a sample Series
ser = pd.Series([10, 20, 30, 40, 50])

# Group by custom function and calculate the mean
grouped_series = ser.groupby(lambda x: 'even' if x % 2 == 0 else 'odd').mean()
print(grouped_series)

# Output:
# even    30
# odd     30
# dtype: int64

Here,

  • The lambda function lambda x: 'even' if x % 2 == 0 else 'odd' is used as a custom function. This function categorizes each value in the Series as either ‘even’ or ‘odd’ based on whether the value is divisible by 2.
  • The groupby() function is applied to group the Series based on the result of the custom function. In this case, it creates two groups: one for even numbers and one for odd numbers
  • Finally, the mean() function is used to calculate the mean for each group.

Group by Boolean Condition and Calculate the Sum

If you want to group a pandas Series by a boolean condition and calculate the sum for each group, you can use a boolean condition directly within the groupby() function and then apply the sum() function.


# Imports pandas
import pandas as pd

# Create a sample Series
ser = pd.Series([10, 20, 30, 40, 50])

# Group by boolean condition and calculate the sum
ser = pd.Series([10, 20, 30, 40, 50])
result = ser.groupby(ser > 30).sum()
print(result)

# Output:
# False    60
# True     90
# dtype: int64

Here,

  • The condition ser > 30 creates a boolean Series where True represents values greater than 30, and False represents values less than or equal to 30.
  • The groupby() function is applied to group the Series based on this boolean condition, creating two groups: one for values greater than 30 (True) and one for values less than or equal to 30 (False).
  • Finally, the sum() function is used to calculate the sum for each group.

Frequently Asked Questions on Pandas Series groupby() Function

What does the groupby() function do in Pandas Series?

The groupby() function in Pandas Series is used to group data based on specified criteria. It involves splitting the data into groups, applying a function to each group, and then combining the results.

How do I use the groupby() function with a custom grouping function?

To use the groupby() function with a custom grouping function in a Pandas Series, you need to pass the result of your custom function as an argument to the groupby() method.

What are some common aggregation functions used with groupby()?

Common aggregation functions used with groupby() include sum(), mean(), count(), min(), max(), and agg() for custom aggregations.

How do I group by the index of a Series?

To group by the index of a Pandas Series, you can use the groupby() function and specify the level parameter with the index level you want to use for grouping.

Can I group by a categorical column?

Pandas supports grouping by categorical columns. When you use groupby() on a categorical column, it respects the order of categories and groups the data accordingly.

Conclusion

In this article, I have explained the groupby() function in the Pandas Series, and using its syntax, parameters, and usage how to group the data in a Series based on some criteria and then perform various operations on each group.

Happy Learning !!

References

Malli

Malli is an experienced technical writer with a passion for translating complex Python concepts into clear, concise, and user-friendly articles. Over the years, he has written hundreds of articles in Pandas, NumPy, Python, and takes pride in ability to bridge the gap between technical experts and end-users.