Site icon Spark By {Examples}

Pandas Series groupby() Function with Examples

pandas series groupby()

The groupby() function in the Pandas Series is a powerful tool for grouping data based on certain criteria. The groupby operation is used to split a DataFrame into groups based on some criteria, and then apply a function to each group independently. When you’re working with a Series, you can still use groupby similarly.

You can group the Pandas Series and calculate various operations on grouped data in many ways, for example, by using groupby() including sum(), mean(), count(), min(), and max() functions. In this article, I will explain the Pandas Series groupby() function and using its syntax, parameters, and usage how we can group the data in the series with multiple examples.

Key Points –

Syntax of Series groupby()

Following is the syntax of Series groupby()


# Syntax of series groupby()
Series.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=<object object>, observed=False, **kwargs)

Parameters of Series groupby()

Following are the parameters of the Series groupby() function.

Return Value

It returns a GroupBy object. This object is an intermediate data structure that represents a mapping of keys to corresponding groups. The actual computation or transformation is often performed after the groupby() operation.

Let’s create a Pandas Series with a customized index and column.


# Imports pandas
import pandas as pd
import numpy as np

# Create a sample Series
data = {'Courses': ["Spark","Python","Spark","Pandas","Python","Pandas"],
        'Fee': [22000,25000,23000,24000,26000,30000]}
ser = pd.Series(data['Fee'], index=data['Courses'])
print("Pandas Series:\n", ser)

Yields below output.

pandas series groupby()

Group by Pandas Series Unique Values and Calculate the Sum

If you want to group a pandas Series by its unique values and calculate the sum for each group, you can use the groupby() function along with an aggregation function like sum().


# Imports pandas
import pandas as pd
import numpy as np
# Group by unique values and calculate the sum
grouped_series = ser.groupby(ser.index).sum()
print("Get the sum of grouped data:\n",grouped_series)

# Group by the 'Courses' and calculate the sum for each group
grouped_series = ser.groupby(level=0).sum()
print("Get the sum of grouped data:\n",grouped_series)

In the above example, the groupby(ser.index) groups the Series by its unique values in the Courses column. Then, the sum() function is applied to calculate the sum of fees for each unique course. The result is displayed as a new Series, where the index represents unique course names, and values represent the corresponding sums of fees. This example yields the below output.

pandas series groupby

Group by Custom Categories and Calculate the Max

Alternatively, you can use the groupby() function along with the max() aggregation function to group by custom categories and calculate the maximum value for each group. Create the custom category using a dictionary and pass it into the groupby() function. It will return the groupby object. After getting the groupby object use max() function to get the maximum values of grouped data.


# Define custom categories
custom_categories = {'Spark': 'Programming', 'Python': 'Programming', 'Pandas': 'Data Analysis'}

# Group by custom categories and calculate the max
grouped_series = ser.groupby(custom_categories).max()
print("Get the maximum value of grouped data:\n", grouped_series)

# Output:
# Get the maximum value of grouped data:
#  Data Analysis    30000
# Programming      26000
# dtype: int64

In the above example, custom_categories is a dictionary that maps each course in the Courses index to a custom category. The groupby(custom_categories) groups the given Series based on these custom categories and then max() is applied to calculate the maximum fee for each category.

Group by String Length and Count Occurrences

Similarly, to group by the length of strings in a pandas Series and count the occurrences of each string length, you can use the groupby() function along with the str.len() method and count() aggregation.


# Imports pandas
import pandas as pd

# Create a sample Series
data = {'Courses': ["Spark", "Python", "Java", "Pandas", "C", "R"]}
ser = pd.Series(data['Courses'])
print("Pandas Series:\n", ser)

# Group by string length and count occurrences
grouped_series = ser.groupby(ser.str.len()).count()
print("Group by string length and count occurrences:\n", grouped_series)

# Output:
# Pandas Series:
#  0     Spark
# 1    Python
# 2      Java
# 3    Pandas
# 4         C
# 5         R
# dtype: object
# Group by string length and count occurrences:
# 1    2
# 4    1
# 5    1
# 6    2
# dtype: int64

In the above example, ser.str.len() is used to get the length of each string in the Series. The groupby(ser.str.len()) groups the Series based on these string lengths, and then count() is applied to calculate the occurrences for each string length.

Group by Even or Odd Values and Calculate the Mean

To group a pandas Series by whether its values are even or odd, and then calculate the mean for each group, you can use the groupby() function along with a custom grouping function and the mean() aggregation.


# Imports pandas
import pandas as pd

# Create a sample Series
ser = pd.Series([1, 2, 3, 4, 5, 6])

# Group by even or odd values and calculate the mean
grouped_series = ser.groupby(ser % 2).mean()
print("Group by even or odd values and calculate the mean:\n", grouped_series)

# Output:
# Group by even or odd values and calculate the mean:
#  0    4
# 1    3
# dtype: int64

Here,

Group by Custom Function and Calculate the Mean

Similarly, You can also group a pandas Series by a custom function and then calculate the mean for each group, you can use the groupby() function along with the custom function and mean() aggregation. For example,


# Imports pandas
import pandas as pd

# Create a sample Series
ser = pd.Series([10, 20, 30, 40, 50])

# Group by custom function and calculate the mean
grouped_series = ser.groupby(lambda x: 'even' if x % 2 == 0 else 'odd').mean()
print(grouped_series)

# Output:
# even    30
# odd     30
# dtype: int64

Here,

Group by Boolean Condition and Calculate the Sum

If you want to group a pandas Series by a boolean condition and calculate the sum for each group, you can use a boolean condition directly within the groupby() function and then apply the sum() function.


# Imports pandas
import pandas as pd

# Create a sample Series
ser = pd.Series([10, 20, 30, 40, 50])

# Group by boolean condition and calculate the sum
ser = pd.Series([10, 20, 30, 40, 50])
result = ser.groupby(ser > 30).sum()
print(result)

# Output:
# False    60
# True     90
# dtype: int64

Here,

Frequently Asked Questions on Pandas Series groupby() Function

What does the groupby() function do in Pandas Series?

The groupby() function in Pandas Series is used to group data based on specified criteria. It involves splitting the data into groups, applying a function to each group, and then combining the results.

How do I use the groupby() function with a custom grouping function?

To use the groupby() function with a custom grouping function in a Pandas Series, you need to pass the result of your custom function as an argument to the groupby() method.

What are some common aggregation functions used with groupby()?

Common aggregation functions used with groupby() include sum(), mean(), count(), min(), max(), and agg() for custom aggregations.

How do I group by the index of a Series?

To group by the index of a Pandas Series, you can use the groupby() function and specify the level parameter with the index level you want to use for grouping.

Can I group by a categorical column?

Pandas supports grouping by categorical columns. When you use groupby() on a categorical column, it respects the order of categories and groups the data accordingly.

Conclusion

In this article, I have explained the groupby() function in the Pandas Series, and using its syntax, parameters, and usage how to group the data in a Series based on some criteria and then perform various operations on each group.

Happy Learning !!

References

Exit mobile version