How to Convert a GroupBy output from Series to Pandas DataFrame? Performing aggregation function after groupby() function returns a pandas Series hence sometimes it is required to covert the result of the groupby from Series to DataFrame.
In this article, I will explain convert Pandas GroupBy result from Series to DataFrame.
1. Quick Examples of Convert GroupBy Series to DataFrame
If you are in hurry below are some quick examples of converting the result of GroupBy from Series to pandas DataFrame
# Quick examples of convert GroupBy series to DataFrame
# Example 1: Convert groupby Series
# Using groupby() & count() on multiple column
grouped_ser = df.groupby(['Courses', 'Duration'])['Fee'].count()
# Example 2: Convert groupby object to DataFrame
grouped_df = grouped_ser.reset_index()
# Example 3: Use the as_index attribute to get groupby DataFrame
grouped_df = df.groupby(['Courses', 'Duration'], as_index = False)['Fee'].count()
# Example 4: Use the to_frame method
grouped_df = grouped_ser.to_frame()
Now, let’s create a DataFrame with a few rows and columns, execute these examples and validate the results. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
.
# Create a pandas DataFrame.
import pandas as pd
technologies = ({
'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark","Python"],
'Fee' :[22000,25000,23000,24000,26000,25000,25000,22000],
'Duration':['30days','50days','35days','40days','60days','35days','55days','50days'],
'Discount':[1000,2300,1000,1200,2500,1300,1400,1600]
})
df = pd.DataFrame(technologies, columns=['Courses','Fee','Duration','Discount'])
print("Create DataFrame\n",df)
Yields below output.
2. Perform Group By & Aggregation
Use pandas DataFrame.groupby() to group the rows by column and use count()
method to get the count for each group by ignoring None and NaN values. It works with non-floating type data as well. The below example does the grouping on Courses
and Duration
column and calculates the count of how many times each value is present.
# Convert groupby Series
# Using groupby() & count() on multiple column
grouped_ser = df.groupby(['Courses', 'Duration'])['Fee'].count()
print("Convert groupby series:\n",grouped_ser)
print("Type:",type(grouped_ser))
Yields below output. Note that the result of the above example is Pandas Series.
Now we have a Series that contains the grouping results.
3. Convert the Groupby Result from Series to Pandas DataFrame
Now, let’s convert the group by aggregation result from Series to Pandas DataFrame, in order to do so all you need is to run reset_index() on the Series object, this converts the Series to DataFrame and set an index to DataFrame.
# Convert groupby object to DataFrame
grouped_df = grouped_ser.reset_index()
print(grouped_df)
print(type(grouped_df))
Yields below output.
# Output
Courses Duration Fee
0 Hadoop 35days 2
1 Pandas 60days 1
2 PySpark 50days 1
3 Python 40days 1
4 Python 50days 1
5 Spark 30days 1
6 Spark 55days 1
As we can see from the above, the Series has been converted to a pandas DataFrame.
4. Use as_index with Groupby() & Convert DataFrame
Alternatively, use 'as_index'
param to the pandas groupby()
function which results in DataFrame directly. By using this you can avoid running additional statements that convert the groupby result from series to DataFrame.
# Use the as_index attribute
# get groupby DataFrame
grouped_df = df.groupby(['Courses', 'Duration'], as_index = False)['Fee'].count()
print(grouped_df)
print(type(grouped_df))
Yields below output.
Courses Duration Fee
0 Hadoop 35days 2
1 Pandas 60days 1
2 PySpark 50days 1
3 Python 40days 1
4 Python 50days 1
5 Spark 30days 1
6 Spark 55days 1
We have the grouped output directly as pandas DataFrame.
5. Use to_frame() to Convert Group Results to Pandas DataFrame
Use the to_frame()
function to convert any pandas Series to a DataFrame object. Let’s use this on our grouped object.
# Use the to_frame method
grouped_df = grouped_ser.to_frame()
print(grouped_df)
print(type(grouped_df))
Yields below output.
# Output
Fee
Courses Duration
Hadoop 35days 2
Pandas 60days 1
PySpark 50days 1
Python 40days 1
50days 1
Spark 30days 1
55days 1
Frequently Asked Questions on Convert GroupBy Output from Series to DataFrame
The GroupBy operation in pandas often results in a Series object, which may not be as convenient for further analysis or visualization. Converting it to a DataFrame allows for a more structured and versatile representation of the grouped data.
To convert a GroupBy Series to a DataFrame in pandas, you can use the reset_index()
method. This method is commonly used to convert the result of a groupby operation, which is often a Series, into a DataFrame with a default integer index.
An alternative method to convert a GroupBy Series to a DataFrame in pandas is to use the to_frame()
method directly on the result of the groupby operation. For example, to_frame()
is applied directly to the grouped_series
, and then reset_index()
is used to convert it into a DataFrame with a default integer index.
To convert the output of a groupby
operation from a Series to a DataFrame in pandas, you can use the reset_index()
method.
To customize the aggregation function when converting a GroupBy Series to a DataFrame in pandas, you can apply your desired aggregation function before using the reset_index()
or to_frame()
method.
You can apply various aggregation functions like sum
, count
, max
, etc., directly to the GroupBy object before using reset_index()
.
Conclusion
In this article, I have explained multiple ways to convert a Pandas GroupBy output from Series to DataFrame with well-defined examples.
Happy learning !!
Related Articles
- Convert Pandas DataFrame to Series
- Pandas Iterate Over Series
- Convert Pandas Series of Lists to One Series
- Convert Series to Dictionary(Dict) in Pandas
- Pandas Get First Column of DataFrame as Series
- Pandas Stack Two Series Vertically and Horizontally
- How to Plot Columns of Pandas DataFrame
- Pandas DataFrame insert() Function
- Pandas Add Column with Default Value
- Compare Two DataFrames Row by Row
- How to Transpose() DataFrame in Pandas?
- How to add column name to Pandas Series?