The groupby() function in the Pandas Series is a powerful tool for grouping data based on certain criteria. The groupby operation is used to split a DataFrame into groups based on some criteria, and then apply a function to each group independently. When you’re working with a Series, you can still use groupby similarly.
You can group the Pandas Series and calculate various operations on grouped data in many ways, for example, by using groupby()
including sum()
, mean()
, count()
, min()
, and max()
functions. In this article, I will explain the Pandas Series groupby() function and using its syntax, parameters, and usage how we can group the data in the series with multiple examples.
Key Points –
- Pandas Series
groupby()
is used for grouping data based on a specified criterion, allowing you to analyze and manipulate subsets of the data independently. - The
groupby()
operation follows the split-apply-combine paradigm. It splits the data into groups, applies a function to each group, and then combines the results into a new data structure. - The primary use of
groupby()
is for aggregation, where you can calculate summary statistics (e.g., sum, mean, count) for each group. Additionally, it supports transformations, allowing you to modify the data within each group. groupby()
is valuable for analyzing categorical data, enabling insights into patterns and trends within different categories or levels of a variable.- The groups created by
groupby()
often serve as an index or label for the results. Theas_index
parameter allows you to control whether the group labels become part of the index in the output.
Syntax of Series groupby()
Following is the syntax of Series groupby()
# Syntax of series groupby()
Series.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=<object object>, observed=False, **kwargs)
Parameters of Series groupby()
Following are the parameters of the Series groupby() function.
by
– This parameter specifies the grouping information. It can be a single label or a list of labels referring to the axis items to be used for grouping. It could also be a function, Series, or dictionary mapping index to group.axis
– The axis along which to group. The default is 0 (rows). You can use 1 for columns.level
– For a MultiIndex, level to use for grouping.as_index
– If True, the group labels will be used as index names. If False, the resulting object will have an index of integers.sort
– Sort the group keys. The default is True.group_keys
– When calling apply, add group keys to the index to identify pieces.squeeze
– Reduce the dimensionality of the return type if possible. If True, return a Series if there is one group.observed
– This is only relevant for categorical data and determines whether to use all categories or only observed categories.**kwargs
– Additional keyword arguments are passed to the groupby function.
Return Value
It returns a GroupBy
object. This object is an intermediate data structure that represents a mapping of keys to corresponding groups. The actual computation or transformation is often performed after the groupby()
operation.
Let’s create a Pandas Series with a customized index and column.
# Imports pandas
import pandas as pd
import numpy as np
# Create a sample Series
data = {'Courses': ["Spark","Python","Spark","Pandas","Python","Pandas"],
'Fee': [22000,25000,23000,24000,26000,30000]}
ser = pd.Series(data['Fee'], index=data['Courses'])
print("Pandas Series:\n", ser)
Yields below output.
Group by Pandas Series Unique Values and Calculate the Sum
If you want to group a pandas Series by its unique values and calculate the sum for each group, you can use the groupby()
function along with an aggregation function like sum()
.
# Imports pandas
import pandas as pd
import numpy as np
# Group by unique values and calculate the sum
grouped_series = ser.groupby(ser.index).sum()
print("Get the sum of grouped data:\n",grouped_series)
# Group by the 'Courses' and calculate the sum for each group
grouped_series = ser.groupby(level=0).sum()
print("Get the sum of grouped data:\n",grouped_series)
In the above example, the groupby(ser.index)
groups the Series by its unique values in the Courses
column. Then, the sum()
function is applied to calculate the sum of fees for each unique course. The result is displayed as a new Series, where the index represents unique course names, and values represent the corresponding sums of fees. This example yields the below output.
Group by Custom Categories and Calculate the Max
Alternatively, you can use the groupby() function along with the max() aggregation function to group by custom categories and calculate the maximum value for each group. Create the custom category using a dictionary and pass it into the groupby() function. It will return the groupby object. After getting the groupby object use max() function to get the maximum values of grouped data.
# Define custom categories
custom_categories = {'Spark': 'Programming', 'Python': 'Programming', 'Pandas': 'Data Analysis'}
# Group by custom categories and calculate the max
grouped_series = ser.groupby(custom_categories).max()
print("Get the maximum value of grouped data:\n", grouped_series)
# Output:
# Get the maximum value of grouped data:
# Data Analysis 30000
# Programming 26000
# dtype: int64
In the above example, custom_categories
is a dictionary that maps each course in the Courses
index to a custom category. The groupby(custom_categories)
groups the given Series based on these custom categories and then max()
is applied to calculate the maximum fee for each category.
Group by String Length and Count Occurrences
Similarly, to group by the length of strings in a pandas Series and count the occurrences of each string length, you can use the groupby()
function along with the str.len()
method and count()
aggregation.
# Imports pandas
import pandas as pd
# Create a sample Series
data = {'Courses': ["Spark", "Python", "Java", "Pandas", "C", "R"]}
ser = pd.Series(data['Courses'])
print("Pandas Series:\n", ser)
# Group by string length and count occurrences
grouped_series = ser.groupby(ser.str.len()).count()
print("Group by string length and count occurrences:\n", grouped_series)
# Output:
# Pandas Series:
# 0 Spark
# 1 Python
# 2 Java
# 3 Pandas
# 4 C
# 5 R
# dtype: object
# Group by string length and count occurrences:
# 1 2
# 4 1
# 5 1
# 6 2
# dtype: int64
In the above example, ser.str.len()
is used to get the length of each string in the Series. The groupby(ser.str.len())
groups the Series based on these string lengths, and then count()
is applied to calculate the occurrences for each string length.
Group by Even or Odd Values and Calculate the Mean
To group a pandas Series by whether its values are even or odd, and then calculate the mean for each group, you can use the groupby()
function along with a custom grouping function and the mean() aggregation.
# Imports pandas
import pandas as pd
# Create a sample Series
ser = pd.Series([1, 2, 3, 4, 5, 6])
# Group by even or odd values and calculate the mean
grouped_series = ser.groupby(ser % 2).mean()
print("Group by even or odd values and calculate the mean:\n", grouped_series)
# Output:
# Group by even or odd values and calculate the mean:
# 0 4
# 1 3
# dtype: int64
Here,
- The
ser % 2
creates groups based on whether each value in the Series is even (group 0) or odd (group 1). - The
groupby()
function is used to group the Series based on these groups. - Finally, the
mean()
function is applied to calculate the mean for each group.
Group by Custom Function and Calculate the Mean
Similarly, You can also group a pandas Series by a custom function and then calculate the mean for each group, you can use the groupby()
function along with the custom function and mean()
aggregation. For example,
# Imports pandas
import pandas as pd
# Create a sample Series
ser = pd.Series([10, 20, 30, 40, 50])
# Group by custom function and calculate the mean
grouped_series = ser.groupby(lambda x: 'even' if x % 2 == 0 else 'odd').mean()
print(grouped_series)
# Output:
# even 30
# odd 30
# dtype: int64
Here,
- The lambda function
lambda x: 'even' if x % 2 == 0 else 'odd'
is used as a custom function. This function categorizes each value in the Series as either ‘even’ or ‘odd’ based on whether the value is divisible by 2. - The
groupby()
function is applied to group the Series based on the result of the custom function. In this case, it creates two groups: one for even numbers and one for odd numbers - Finally, the
mean()
function is used to calculate the mean for each group.
Group by Boolean Condition and Calculate the Sum
If you want to group a pandas Series by a boolean condition and calculate the sum for each group, you can use a boolean condition directly within the groupby()
function and then apply the sum()
function.
# Imports pandas
import pandas as pd
# Create a sample Series
ser = pd.Series([10, 20, 30, 40, 50])
# Group by boolean condition and calculate the sum
ser = pd.Series([10, 20, 30, 40, 50])
result = ser.groupby(ser > 30).sum()
print(result)
# Output:
# False 60
# True 90
# dtype: int64
Here,
- The condition
ser > 30
creates a boolean Series whereTrue
represents values greater than 30, andFalse
represents values less than or equal to 30. - The
groupby()
function is applied to group the Series based on this boolean condition, creating two groups: one for values greater than 30 (True
) and one for values less than or equal to 30 (False
). - Finally, the
sum()
function is used to calculate the sum for each group.
Frequently Asked Questions on Pandas Series groupby() Function
The groupby()
function in Pandas Series is used to group data based on specified criteria. It involves splitting the data into groups, applying a function to each group, and then combining the results.
To use the groupby()
function with a custom grouping function in a Pandas Series, you need to pass the result of your custom function as an argument to the groupby()
method.
Common aggregation functions used with groupby()
include sum()
, mean()
, count()
, min()
, max()
, and agg()
for custom aggregations.
To group by the index of a Pandas Series, you can use the groupby()
function and specify the level
parameter with the index level you want to use for grouping.
Pandas supports grouping by categorical columns. When you use groupby()
on a categorical column, it respects the order of categories and groups the data accordingly.
Conclusion
In this article, I have explained the groupby() function in the Pandas Series, and using its syntax, parameters, and usage how to group the data in a Series based on some criteria and then perform various operations on each group.
Happy Learning !!
Related Articles
- Pandas Iterate Over Series
- Pandas.Series.combine() Function
- Pandas Series.isin() Function
- Convert Pandas Series to DataFrame
- Pandas.Series.combine() function
- Pandas Series loc[] Function
- Convert Pandas Series to String
- Pandas Get Floor or Ceil of Series
- How to Rename a Pandas Series
- Pandas Series sum() Function
- Pandas Series unique() Function with Examples
- How to Get the Length of a Series in Pandas?
- Find Intersection Between Two Series in Pandas?
- How to Convert NumPy Array to Pandas Series?
- Pandas Stack Two Series Vertically and Horizontally