Convert GroupBy output from Series to DataFrame?

  • Post author:
  • Post category:Pandas / Python
  • Post last modified:January 25, 2023
Spread the love

How to Convert a GroupBy output from Series to Pandas DataFrame? Performing aggregation function after groupby() function returns a pandas Series hence sometimes it is required to covert the result of the groupby from Series to DataFrame.

In this article, I will explain Convert Pandas GroupBy result from Series to DataFrame

1. Quick Examples of Convert GroupBy Series to DataFrame

If you are in hurry below are some quick examples of Converting the result of GroupBy from Series to pandas DataFrame


# Below are a quick example
# Example 1: Convert groupby Series
# Using groupby() & count() on multiple column
grouped_ser = df.groupby(['Courses', 'Duration'])['Fee'].count()

# Example 2: Convert groupby object to DataFrame 
grouped_df = grouped_ser.reset_index()

# Example 3: Use the as_index attribute to get groupby DataFrame
grouped_df = df.groupby(['Courses', 'Duration'], as_index = False)['Fee'].count()

# Example 4: Use the to_frame method
grouped_df = grouped_ser.to_frame()

Now, let’s create a DataFrame with a few rows and columns, execute these examples and validate the results. Our DataFrame contains column names Courses, Fee, Duration, and Discount.


# Create a pandas DataFrame.
import pandas as pd
technologies   = ({
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark","Python"],
    'Fee' :[22000,25000,23000,24000,26000,25000,25000,22000],
    'Duration':['30days','50days','35days','40days','60days','35days','55days','50days'],
    'Discount':[1000,2300,1000,1200,2500,1300,1400,1600]
                })
df = pd.DataFrame(technologies, columns=['Courses','Fee','Duration','Discount'])
print(df)

Yields below output.


   Courses    Fee Duration  Discount
0    Spark  22000   30days      1000
1  PySpark  25000   50days      2300
2   Hadoop  23000   35days      1000
3   Python  24000   40days      1200
4   Pandas  26000   60days      2500
5   Hadoop  25000   35days      1300
6    Spark  25000   55days      1400
7   Python  22000   50days      1600

2. Perform Group By & Aggregation

Use pandas DataFrame.groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and NaN values. It works with non-floating type data as well. The below example does the grouping on Courses and Duration column and calculates the count of how many times each value is present.


# Convert groupby Series
# Using groupby() & count() on multiple column
grouped_ser = df.groupby(['Courses', 'Duration'])['Fee'].count()
print(grouped_ser)
print(type(grouped_ser))

Yields below output. Note that the result of the above example is Pandas Series.


# Output
Courses  Duration
Hadoop   35days      2
Pandas   60days      1
PySpark  50days      1
Python   40days      1
         50days      1
Spark    30days      1
         55days      1
Name: Fee, dtype: int64

Now we have a Series that contains the grouping results.

3. Convert the Groupby Result from Series to Pandas DataFrame

Now, let’s convert the group by aggregation result from Series to Pandas DataFrame, in order to do so all you need is to run reset_index() on the Series object, this converts the Series to DataFrame and set an index to DataFrame.


# Convert groupby object to DataFrame 
grouped_df = grouped_ser.reset_index()
print(grouped_df)
print(type(grouped_df))

Yields below output.


# Output
  Courses Duration  Fee
0   Hadoop   35days    2
1   Pandas   60days    1
2  PySpark   50days    1
3   Python   40days    1
4   Python   50days    1
5    Spark   30days    1
6    Spark   55days    1

As we can see from the above, the Series has been converted to a pandas DataFrame.

4. Use as_index with Groupby() & convert DataFrame

Alternatively use 'as_index' param to the pandas groupby() function which results in DataFrame directly. By using this you can avoid running additional statements that convert the groupby result from series to dataframe.


# Use the as_index attribute 
# get groupby DataFrame
grouped_df = df.groupby(['Courses', 'Duration'], as_index = False)['Fee'].count()
print(grouped_df)
print(type(grouped_df))

Yields below output.


   Courses Duration  Fee
0   Hadoop   35days    2
1   Pandas   60days    1
2  PySpark   50days    1
3   Python   40days    1
4   Python   50days    1
5    Spark   30days    1
6    Spark   55days    1

We have the grouped output directly as pandas DataFrame.

5. Use to_frame() to Convert Group Results to Pandas DataFrame

Use the to_frame() function to convert any pandas Series to a DataFrame object. Let’s use this on our grouped object.


# Use the to_frame method
grouped_df = grouped_ser.to_frame()
print(grouped_df)
print(type(grouped_df))

Yields below output.


# Output
                  Fee
Courses Duration     
Hadoop  35days      2
Pandas  60days      1
PySpark 50days      1
Python  40days      1
        50days      1
Spark   30days      1
        55days      1

6. Conclusion

In this article, I have explained multiple ways to convert a Pandas GroupBy output from Series to DataFrame with well-defined examples.

Happy learning !!

Related Articles

References

Leave a Reply

You are currently viewing Convert GroupBy output from Series to DataFrame?