You can calculate the percentage of the total within each group using DataFrame.groupby()
along with DataFrame.agg()
, DataFrame.transform()
, and DataFrame.apply()
with lambda
function. You can also calculate the percentage by using sum
and divide
functions.
In this article, You can find out how to calculate the percentage total of pandas DataFrame with some examples.
1. Quick Examples of Pandas Percentage Total by Groupby
If you are in a hurry below are some quick examples of calculating the percentage total of Pandas DataFrame.
# Below are some quick examples.
# EXample 1: Using DataFrame.agg() Method.
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
# EXample 2: Percentage by lambda and DataFrame.apply() method.
df3 = df2.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))
# EXample 3: Using DataFrame.div() method.
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
Courses = df.groupby(['Courses']).agg({'Fee': 'sum'})
df2.div(Courses, level='Courses') * 100
# EXample 4: Using groupby with DataFrame.rename() Method.
df2= df.groupby(['Courses', 'Fee'])['Fee'].sum().rename("count")
# EXample 5: Using DataFrame.transform() method.
df['%'] = 100 * df['Fee'] / df.groupby('Courses')['Fee'].transform('sum')
# EXample 6: Alternative method of DataFrame.transform() by lambda functions.
df['Courses_Fee'] = df.groupby(['Courses'])['Fee'].transform(lambda x: x/x.sum())
# EXample 7: Caluclate groupby with DataFrame.rename() and DataFrame.transform() with lambda functions.
df2=df.groupby(['Courses', 'Fee'])['Fee'].sum().rename("Courses_fee").groupby(level = 0).transform(lambda x: x/x.sum())
Now, Let’s create a Pandas DataFrame with a few rows and columns, execute these examples, and validate the results that calculate the percentage total of the Pandas DataFrame.
# Create a Pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Spark","Python","PySpark"],
'Fee' :[22000,25000,23000,24000,26000],
'Duration':['30days','50days','30days', None,np.nan]
}
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)
Yields below output.
2. Pandas Calculate percentage with Groupby With .agg() Method
You can calculate the percentage of the total within each group using the DataFrame.groupby() method along with agg()
function. You can use the groupby()
method on the DataFrame df
to group it by the columns 'Courses'
and 'Fee'.
Then, you can apply the agg(aggregate) function to perform an aggregation operation on the grouped data.
Let’s calculate the percentage of the total “Fee” within each "course"
group.
# Using DataFrame.agg() Method.
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
print("Get percentage of the total Fee within each course group:\n", df2)
Yields below output.
You can use the groupby()
method on the DataFrame df2
to group it by the first level of the index (level 0). Then, you can apply a lambda function using the apply()
method to calculate the percentage of the total for each group.
After grouping, the apply()
method is used to apply a lambda function to each group. The lambda function calculates the percentage of the total for each group. It takes each value (x
) in the group, multiplying it by 100, and then dividing by the sum of all values in the group.
# Percentage by lambda and DataFrame.apply() method.
df3 = df2.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))
print(df3)
Yields below output.
# Output:
Courses Fee
PySpark 25000 49.019608
26000 50.980392
Python 24000 100.000000
Spark 22000 48.888889
23000 51.111111
Another method is to calculate the percentage of the total for each group using DataFrame.div()
method. Here div
tells pandas to join the DataFrame based on the values at the Courses
level of the index
.
# Using DataFrame.div() method.
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
Courses = df.groupby(['Courses']).agg({'Fee': 'sum'})
df3=df2.div(Courses, level='Courses') * 100
print(df3)
Yields output same as above.
3. Using groupby with DataFrame.transform() Method
You can also calculate the total percentage within each group using groupby() along with DataFrame.transform() method. The transform() method allows you to execute a function for each value of the DataFrame. Here, the percentage is directly summarized in DataFrame, then the results will be calculated using all the data.
# Using DataFrame.transform() method.
df['%'] = 100 * df['Fee'] / df.groupby('Courses')['Fee'].transform('sum')
print(df)
Yields below output.
# Output:
Courses Fee Duration %
0 Spark 22000 30days 48.888889
1 PySpark 25000 50days 49.019608
2 Spark 23000 30days 51.111111
3 Python 24000 60days 100.000000
4 PySpark 26000 35days 50.980392
Alternatively, you can also calculate the percentage of the total within each group by using DataFrame.transform()
method with lambda
functions in which you can add the percentages as a new column, leaving the rest of the DataFrame untouched.
# Alternative method of DataFrame.transform() by lambda functions.
df['Courses_Fee'] = df.groupby(['Courses'])['Fee'].transform(lambda x: x/x.sum())
print(df)
Yields Output same as above.
4. Other Example-
You can calculate the percentage of the total of each Pandas group by using groupby along with lambda
function.
# Caluclate groupby with DataFrame.rename() and DataFrame.transform() with lambda functions.
df2=df.groupby(['Courses', 'Fee'])['Fee'].sum().rename("Courses_fee").groupby(level = 0).transform(lambda x: x/x.sum())
print(df2)
Yields below output.
# Output:
Courses Fee
PySpark 25000 0.490196
26000 0.509804
Python 24000 1.000000
Spark 22000 0.488889
23000 0.511111
Name: Courses_fee, dtype: float64
6. Complete Examples to Calculate Percentage with Groupby
Below are Complete examples to calculate percentages with groupby of pandas DataFrame.
# Below are complete examples.
# Create a Pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Spark","Python","PySpark"],
'Fee' :[22000,25000,23000,24000,26000],
'Duration':['30days','50days','30days', None,np.nan]
}
df = pd.DataFrame(technologies)
print(df)
# Using DataFrame.agg() Method.
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
print(df2)
# Percentage by lambda and DataFrame.apply() method.
df3 = df2.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))
print(df3)
# Using DataFrame.div() method.
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
Courses = df.groupby(['Courses']).agg({'Fee': 'sum'})
df2.div(Courses, level='Courses') * 100
print(df2)
# Using DataFrame.transform() method.
df['%'] = 100 * df['Fee'] / df.groupby('Courses')['Fee'].transform('sum')
print(df)
# Alternative method of DataFrame.transform() by lambda functions.
df['Courses_Fee'] = df.groupby(['Courses'])['Fee'].transform(lambda x: x/x.sum())
print(df)
# Caluclate groupby with DataFrame.rename() and DataFrame.transform() with lambda functions.
df2=df.groupby(['Courses', 'Fee'])['Fee'].sum().rename("Courses_fee").groupby(level = 0).transform(lambda x: x/x.sum())
print(df2)
Conclusion
In this article, You have learned how to calculate the percentage of the total within each Pandas group by using DataFrame.groupby()
function along with DataFrame.agg()
, DataFrame.transform()
and DataFrame.apply()
methods with lambda
function.
Related Articles
- Pandas GroupBy Multiple Columns Explained
- How to GroupBy Index in Pandas?
- Pandas Groupby Sort within Groups
- Pandas groupby() and count() with Examples
- Pandas groupby() multiple columns explained
- Pandas groupby() sort within groups
- Pandas groupby() and sum() with examples.
- Convert groupby() output from series to DatatFrame
- Pandas Group Rows into List Using groupby()
- Pandas DataFrame count() Function
- How to groupby() index in Pandas DataFrame