You can calculate the percentage of the total within each group using DataFrame.groupby()
along with DataFrame.agg()
, DataFrame.transform()
, and DataFrame.apply()
with lambda
function. You can also calculate the percentage by using sum
and divide
functions.
In this article, You can find out how to calculate the percentage total of pandas DataFrame with some examples.
Key Points –
- Calculating percentage totals with
groupby
is useful for understanding the relative distribution of values within each group in a DataFrame. - Use aggregation methods (like
sum()
,count()
, ormean()
) to calculate the total values within each group. - To calculate percentages, divide the aggregated value of each group by the total of the column or group, then multiply by 100.
- After grouping, you can calculate percentages for individual columns by dividing the group’s total by the overall total and multiplying by 100.
- The
apply()
function allows custom lambda functions for more complex percentage calculations after grouping. - Using methods like
div()
ortransform()
is more efficient than using loops, especially for large datasets.
Quick Examples of Pandas Percentage Total by Groupby
If you are in a hurry below are some quick examples of calculating the percentage total of Pandas DataFrame.
# Quick examples of pandas percentage total by Groupby
# EXample 1: Using DataFrame.agg() method
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
# EXample 2: Percentage by lambda and DataFrame.apply() method
df3 = df2.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))
# EXample 3: Using DataFrame.div() method
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
Courses = df.groupby(['Courses']).agg({'Fee': 'sum'})
df2.div(Courses, level='Courses') * 100
# EXample 4: Using groupby with DataFrame.rename() method
df2= df.groupby(['Courses', 'Fee'])['Fee'].sum().rename("count")
# EXample 5: Using DataFrame.transform() method
df['%'] = 100 * df['Fee'] / df.groupby('Courses')['Fee'].transform('sum')
# EXample 6: Alternative method of DataFrame.transform() by lambda functions
df['Courses_Fee'] = df.groupby(['Courses'])['Fee'].transform(lambda x: x/x.sum())
# EXample 7: Caluclate groupby with DataFrame.rename() and DataFrame.transform() with lambda functions
df2=df.groupby(['Courses', 'Fee'])['Fee'].sum().rename("Courses_fee").groupby(level = 0).transform(lambda x: x/x.sum())
Now, Let’s create a Pandas DataFrame with a few rows and columns, execute these examples, and validate the results that calculate the percentage total of the Pandas DataFrame.
# Create a Pandas DataFrame
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Spark","Python","PySpark"],
'Fee' :[22000,25000,23000,24000,26000],
'Duration':['30days','50days','30days', None,np.nan]
}
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)
Yields below output.
Pandas Calculate percentage with Groupby With .agg() Method
You can calculate the percentage of the total within each group using the DataFrame.groupby() method along with agg()
function. You can use the groupby()
method on the DataFrame df
to group it by the columns 'Courses'
and 'Fee'.
Then, you can apply the agg(aggregate) function to perform an aggregation operation on the grouped data.
Let’s calculate the percentage of the total “Fee” within each "course"
group.
# Using DataFrame.agg() method
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
print("Get percentage of the total Fee within each course group:\n", df2)
Yields below output.
You can use the groupby()
method on the DataFrame df2
to group it by the first level of the index (level 0). Then, you can apply a lambda function using the apply()
method to calculate the percentage of the total for each group.
After grouping, the apply()
method is used to apply a lambda function to each group. The lambda function calculates the percentage of the total for each group. It takes each value (x
) in the group, multiplying it by 100, and then dividing by the sum of all values in the group.
# Percentage by lambda and DataFrame.apply() method
df3 = df2.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))
print(df3)
Yields below output.
# Output:
Courses Fee
PySpark 25000 49.019608
26000 50.980392
Python 24000 100.000000
Spark 22000 48.888889
23000 51.111111
Another method is to calculate the percentage of the total for each group using DataFrame.div()
method. Here div
tells pandas to join the DataFrame based on the values at the Courses
level of the index
.
# Using DataFrame.div() method
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
Courses = df.groupby(['Courses']).agg({'Fee': 'sum'})
df3=df2.div(Courses, level='Courses') * 100
print(df3)
Yields output same as above.
Using groupby with DataFrame.transform() Method
You can also calculate the total percentage within each group using groupby() along with DataFrame.transform() method. The transform() method allows you to execute a function for each value of the DataFrame. Here, the percentage is directly summarized in DataFrame, then the results will be calculated using all the data.
# Using DataFrame.transform() method
df['%'] = 100 * df['Fee'] / df.groupby('Courses')['Fee'].transform('sum')
print(df)
Yields below output.
# Output:
Courses Fee Duration %
0 Spark 22000 30days 48.888889
1 PySpark 25000 50days 49.019608
2 Spark 23000 30days 51.111111
3 Python 24000 60days 100.000000
4 PySpark 26000 35days 50.980392
Alternatively, you can also calculate the percentage of the total within each group by using DataFrame.transform()
method with lambda
functions in which you can add the percentages as a new column, leaving the rest of the DataFrame untouched.
# Alternative method of DataFrame.transform() by lambda functions
df['Courses_Fee'] = df.groupby(['Courses'])['Fee'].transform(lambda x: x/x.sum())
print(df)
Yields Output same as above.
Other Example-
You can calculate the percentage of the total of each Pandas group by using groupby along with lambda
function.
# Caluclate groupby with DataFrame.rename() and DataFrame.transform() with lambda functions.
df2=df.groupby(['Courses', 'Fee'])['Fee'].sum().rename("Courses_fee").groupby(level = 0).transform(lambda x: x/x.sum())
print(df2)
Yields below output.
# Output:
Courses Fee
PySpark 25000 0.490196
26000 0.509804
Python 24000 1.000000
Spark 22000 0.488889
23000 0.511111
Name: Courses_fee, dtype: float64
Complete Examples to Calculate Percentage with Groupby
Below are Complete examples to calculate percentages with groupby of pandas DataFrame.
# Below are complete examples
# Create a Pandas DataFrame
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Spark","Python","PySpark"],
'Fee' :[22000,25000,23000,24000,26000],
'Duration':['30days','50days','30days', None,np.nan]
}
df = pd.DataFrame(technologies)
print(df)
# Using DataFrame.agg() Method
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
print(df2)
# Percentage by lambda and DataFrame.apply() method
df3 = df2.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))
print(df3)
# Using DataFrame.div() method.
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
Courses = df.groupby(['Courses']).agg({'Fee': 'sum'})
df2.div(Courses, level='Courses') * 100
print(df2)
# Using DataFrame.transform() method
df['%'] = 100 * df['Fee'] / df.groupby('Courses')['Fee'].transform('sum')
print(df)
# Alternative method of DataFrame.transform() by lambda functions
df['Courses_Fee'] = df.groupby(['Courses'])['Fee'].transform(lambda x: x/x.sum())
print(df)
# Caluclate groupby with DataFrame.rename() and DataFrame.transform() with lambda functions
df2=df.groupby(['Courses', 'Fee'])['Fee'].sum().rename("Courses_fee").groupby(level = 0).transform(lambda x: x/x.sum())
print(df2)
FAQ on Pandas Percentage Total With Groupby
The groupby
function in Pandas allows you to group rows of a DataFrame based on one or more columns. Once grouped, you can apply aggregation functions such as sum, mean, or count. To calculate percentages, you can use groupby
to get the total per group and then compute individual values as a percentage of the total.
To calculate the percentage of a column’s total for each group in a Pandas DataFrame, you can use the groupby
function in combination with transform
to compute the percentage of the total within each group.
You can calculate the percentage of the total for all groups combined by using the total sum of the entire column (across all groups) and then calculating each group’s value as a percentage of this grand total.
You can compute percentages using the agg()
function by first aggregating the data and then calculating the percentage within the aggregation results. The agg()
function allows you to apply multiple aggregation functions at once and can be customized to calculate the sum of a column and then compute the percentage for each group.
If you want to calculate the percentage within multiple columns, you can use groupby
along with agg()
to perform aggregations on multiple columns and then compute percentages based on the sums or other aggregation results.
Conclusion
In this article, You have learned how to calculate the percentage of the total within each Pandas group by using DataFrame.groupby()
function along with DataFrame.agg()
, DataFrame.transform()
and DataFrame.apply()
methods with lambda
function.
Related Articles
- How to GroupBy Index in Pandas?
- Pandas Groupby Sort within Groups
- Pandas DataFrame count() Function
- Pandas groupby() sort within groups
- Pandas groupby() and count() with Examples
- Pandas groupby() multiple columns explained
- Pandas groupby() and sum() with examples.
- Pandas Group Rows into List Using groupby()
- Pandas GroupBy Multiple Columns Explained
- How to groupby() index in Pandas DataFrame
- Convert groupby() output from series to DatatFrame