• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:15 mins read
You are currently viewing Convert GroupBy output from Series to DataFrame?

How to Convert a GroupBy output from Series to Pandas DataFrame? Performing aggregation function after groupby() function returns a pandas Series hence sometimes it is required to covert the result of the groupby from Series to DataFrame.

In this article, I will explain convert Pandas GroupBy result from Series to DataFrame.

1. Quick Examples of Convert GroupBy Series to DataFrame

If you are in hurry below are some quick examples of converting the result of GroupBy from Series to pandas DataFrame


# Quick examples of convert GroupBy series to DataFrame

# Example 1: Convert groupby Series
# Using groupby() & count() on multiple column
grouped_ser = df.groupby(['Courses', 'Duration'])['Fee'].count()

# Example 2: Convert groupby object to DataFrame 
grouped_df = grouped_ser.reset_index()

# Example 3: Use the as_index attribute to get groupby DataFrame
grouped_df = df.groupby(['Courses', 'Duration'], as_index = False)['Fee'].count()

# Example 4: Use the to_frame method
grouped_df = grouped_ser.to_frame()

Now, let’s create a DataFrame with a few rows and columns, execute these examples and validate the results. Our DataFrame contains column names Courses, Fee, Duration, and Discount.


# Create a pandas DataFrame.
import pandas as pd
technologies   = ({
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark","Python"],
    'Fee' :[22000,25000,23000,24000,26000,25000,25000,22000],
    'Duration':['30days','50days','35days','40days','60days','35days','55days','50days'],
    'Discount':[1000,2300,1000,1200,2500,1300,1400,1600]
                })
df = pd.DataFrame(technologies, columns=['Courses','Fee','Duration','Discount'])
print("Create DataFrame\n",df)

Yields below output.

Pandas GroupBy Series to DataFrame

2. Perform Group By & Aggregation

Use pandas DataFrame.groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and NaN values. It works with non-floating type data as well. The below example does the grouping on Courses and Duration column and calculates the count of how many times each value is present.


# Convert groupby Series
# Using groupby() & count() on multiple column
grouped_ser = df.groupby(['Courses', 'Duration'])['Fee'].count()
print("Convert groupby series:\n",grouped_ser)
print("Type:",type(grouped_ser))

Yields below output. Note that the result of the above example is Pandas Series.

Pandas GroupBy Series to DataFrame

Now we have a Series that contains the grouping results.

3. Convert the Groupby Result from Series to Pandas DataFrame

Now, let’s convert the group by aggregation result from Series to Pandas DataFrame, in order to do so all you need is to run reset_index() on the Series object, this converts the Series to DataFrame and set an index to DataFrame.


# Convert groupby object to DataFrame 
grouped_df = grouped_ser.reset_index()
print(grouped_df)
print(type(grouped_df))

Yields below output.


# Output
  Courses Duration  Fee
0   Hadoop   35days    2
1   Pandas   60days    1
2  PySpark   50days    1
3   Python   40days    1
4   Python   50days    1
5    Spark   30days    1
6    Spark   55days    1

As we can see from the above, the Series has been converted to a pandas DataFrame.

4. Use as_index with Groupby() & Convert DataFrame

Alternatively, use 'as_index' param to the pandas groupby() function which results in DataFrame directly. By using this you can avoid running additional statements that convert the groupby result from series to DataFrame.


# Use the as_index attribute 
# get groupby DataFrame
grouped_df = df.groupby(['Courses', 'Duration'], as_index = False)['Fee'].count()
print(grouped_df)
print(type(grouped_df))

Yields below output.


   Courses Duration  Fee
0   Hadoop   35days    2
1   Pandas   60days    1
2  PySpark   50days    1
3   Python   40days    1
4   Python   50days    1
5    Spark   30days    1
6    Spark   55days    1

We have the grouped output directly as pandas DataFrame.

5. Use to_frame() to Convert Group Results to Pandas DataFrame

Use the to_frame() function to convert any pandas Series to a DataFrame object. Let’s use this on our grouped object.


# Use the to_frame method
grouped_df = grouped_ser.to_frame()
print(grouped_df)
print(type(grouped_df))

Yields below output.


# Output
                  Fee
Courses Duration     
Hadoop  35days      2
Pandas  60days      1
PySpark 50days      1
Python  40days      1
        50days      1
Spark   30days      1
        55days      1

Frequently Asked Questions on Convert GroupBy Output from Series to DataFrame

Why do I need to convert GroupBy output to a DataFrame?

The GroupBy operation in pandas often results in a Series object, which may not be as convenient for further analysis or visualization. Converting it to a DataFrame allows for a more structured and versatile representation of the grouped data.

How can I convert a GroupBy Series to a DataFrame?

To convert a GroupBy Series to a DataFrame in pandas, you can use the reset_index() method. This method is commonly used to convert the result of a groupby operation, which is often a Series, into a DataFrame with a default integer index.

Is there an alternative method to convert a GroupBy Series to a DataFrame?

An alternative method to convert a GroupBy Series to a DataFrame in pandas is to use the to_frame() method directly on the result of the groupby operation. For example, to_frame() is applied directly to the grouped_series, and then reset_index() is used to convert it into a DataFrame with a default integer index.

How do I convert the output of a groupby operation from a Series to a DataFrame in pandas?

To convert the output of a groupby operation from a Series to a DataFrame in pandas, you can use the reset_index() method.

How can I customize the aggregation function when converting to a DataFrame?

To customize the aggregation function when converting a GroupBy Series to a DataFrame in pandas, you can apply your desired aggregation function before using the reset_index() or to_frame() method.

What if I want to use a different aggregation function?

You can apply various aggregation functions like sum, count, max, etc., directly to the GroupBy object before using reset_index().

Conclusion

In this article, I have explained multiple ways to convert a Pandas GroupBy output from Series to DataFrame with well-defined examples.

Happy learning !!

Related Articles

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium