How to Convert a GroupBy output from Series to Pandas DataFrame? Performing aggregation function after groupby() function returns a pandas Series hence sometimes it is required to covert the result of the groupby from Series to DataFrame.
In this article, I will explain Convert Pandas GroupBy result from Series to DataFrame
1. Quick Examples of Convert GroupBy Series to DataFrame
If you are in hurry below are some quick examples of Converting the result of GroupBy from Series to pandas DataFrame
# Below are a quick example
# Example 1: Convert groupby Series
# Using groupby() & count() on multiple column
grouped_ser = df.groupby(['Courses', 'Duration'])['Fee'].count()
# Example 2: Convert groupby object to DataFrame
grouped_df = grouped_ser.reset_index()
# Example 3: Use the as_index attribute to get groupby DataFrame
grouped_df = df.groupby(['Courses', 'Duration'], as_index = False)['Fee'].count()
# Example 4: Use the to_frame method
grouped_df = grouped_ser.to_frame()
Now, let’s create a DataFrame with a few rows and columns, execute these examples and validate the results. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
.
# Create a pandas DataFrame.
import pandas as pd
technologies = ({
'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark","Python"],
'Fee' :[22000,25000,23000,24000,26000,25000,25000,22000],
'Duration':['30days','50days','35days','40days','60days','35days','55days','50days'],
'Discount':[1000,2300,1000,1200,2500,1300,1400,1600]
})
df = pd.DataFrame(technologies, columns=['Courses','Fee','Duration','Discount'])
print(df)
Yields below output.
Courses Fee Duration Discount
0 Spark 22000 30days 1000
1 PySpark 25000 50days 2300
2 Hadoop 23000 35days 1000
3 Python 24000 40days 1200
4 Pandas 26000 60days 2500
5 Hadoop 25000 35days 1300
6 Spark 25000 55days 1400
7 Python 22000 50days 1600
2. Perform Group By & Aggregation
Use pandas DataFrame.groupby() to group the rows by column and use count()
 method to get the count for each group by ignoring None and NaN values. It works with non-floating type data as well. The below example does the grouping on Courses
 and Duration
column and calculates the count of how many times each value is present.
# Convert groupby Series
# Using groupby() & count() on multiple column
grouped_ser = df.groupby(['Courses', 'Duration'])['Fee'].count()
print(grouped_ser)
print(type(grouped_ser))
Yields below output. Note that the result of the above example is Pandas Series.
# Output
Courses Duration
Hadoop 35days 2
Pandas 60days 1
PySpark 50days 1
Python 40days 1
50days 1
Spark 30days 1
55days 1
Name: Fee, dtype: int64
Now we have a Series that contains the grouping results.
3. Convert the Groupby Result from Series to Pandas DataFrame
Now, let’s convert the group by aggregation result from Series to Pandas DataFrame, in order to do so all you need is to run reset_index() on the Series object, this converts the Series to DataFrame and set an index to DataFrame.
# Convert groupby object to DataFrame
grouped_df = grouped_ser.reset_index()
print(grouped_df)
print(type(grouped_df))
Yields below output.
# Output
Courses Duration Fee
0 Hadoop 35days 2
1 Pandas 60days 1
2 PySpark 50days 1
3 Python 40days 1
4 Python 50days 1
5 Spark 30days 1
6 Spark 55days 1
As we can see from the above, the Series has been converted to a pandas DataFrame.
4. Use as_index with Groupby() & convert DataFrame
Alternatively use 'as_index'
param to the pandas groupby() function which results in DataFrame directly. By using this you can avoid running additional statements that convert the groupby result from series to dataframe.
# Use the as_index attribute
# get groupby DataFrame
grouped_df = df.groupby(['Courses', 'Duration'], as_index = False)['Fee'].count()
print(grouped_df)
print(type(grouped_df))
Yields below output.
Courses Duration Fee
0 Hadoop 35days 2
1 Pandas 60days 1
2 PySpark 50days 1
3 Python 40days 1
4 Python 50days 1
5 Spark 30days 1
6 Spark 55days 1
We have the grouped output directly as pandas DataFrame.
5. Use to_frame() to Convert Group Results to Pandas DataFrame
Use the to_frame() function to convert any pandas Series to a DataFrame object. Let’s use this on our grouped object.
# Use the to_frame method
grouped_df = grouped_ser.to_frame()
print(grouped_df)
print(type(grouped_df))
Yields below output.
# Output
Fee
Courses Duration
Hadoop 35days 2
Pandas 60days 1
PySpark 50days 1
Python 40days 1
50days 1
Spark 30days 1
55days 1
6. Conclusion
In this article, I have explained multiple ways to convert a Pandas GroupBy output from Series to DataFrame with well-defined examples.
Happy learning !!
Related Articles
- Convert Pandas Series of Lists to One Series
- Convert Series to Dictionary(Dict) in Pandas
- Pandas Get First Column of DataFrame as Series
- Pandas Stack Two Series Vertically and Horizontally
- How to Plot Columns of Pandas DataFrame
- Convert Pandas DataFrame to Series
- Pandas Iterate Over Series
- How to add column name to Pandas Series?