You can calculate the percentage of total with the groupby of pandas DataFrame by using DataFrame.groupby()
, DataFrame.agg()
, DataFrame.transform()
methods and DataFrame.apply()
with lambda
function. You can also calculate percentage by sum
and divide
functions.
In this article, You can find out how to calculate the percentage total of pandas DataFrame with some below examples.
1. Quick Examples of Pandas Percentage Total by Groupby
If, You are in hurry below are some quick examples to calculate percentage total of Pandas DataFrame.
# Below are some quick examples.
# Using DataFrame.agg() Method.
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
# Percentage by lambda and DataFrame.apply() method.
df3 = df2.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))
# Using DataFrame.div() method.
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
Courses = df.groupby(['Courses']).agg({'Fee': 'sum'})
df2.div(Courses, level='Courses') * 100
# Using groupby with DataFrame.rename() Method.
df2= df.groupby(['Courses', 'Fee'])['Fee'].sum().rename("count")
# Using DataFrame.transform() method.
df['%'] = 100 * df['Fee'] / df.groupby('Courses')['Fee'].transform('sum')
# Alternative method of DataFrame.transform() by lambda functions.
df['Courses_Fee'] = df.groupby(['Courses'])['Fee'].transform(lambda x: x/x.sum())
# Caluclate groupby with DataFrame.rename() and DataFrame.transform() with lambda functions.
df2=df.groupby(['Courses', 'Fee'])['Fee'].sum().rename("Courses_fee").groupby(level = 0).transform(lambda x: x/x.sum())
Now, Let’s create a pandas DataFrame with a few rows and columns, execute these examples and validate results that calculate the percentage total of pandas DataFrame.
# Create a Pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Spark","Python","PySpark"],
'Fee' :[22000,25000,23000,24000,26000],
'Duration':['30days','50days','30days', None,np.nan]
}
df = pd.DataFrame(technologies)
print(df)
Yields below output.
Courses Fee Duration
0 Spark 22000 30days
1 PySpark 25000 50days
2 Spark 23000 30days
3 Python 24000 60days
4 PySpark 26000 35days
2. Pandas Calculate percentage with Groupby With .agg() Method
You can calculate the percentage by using DataFrame.groupby()
method. It is a process involving one or more of the following steps.
- Splitting the data into groups based on some criteria.
- Applying a function to each group independently.
- Combining the results into a data structure.
The DataFrame.agg()
method is used to pass a function or list of functions to be applied on a series or even each element of series separately. In the list of functions, multiple results are returned by DataFrame.agg()
method.
# Using DataFrame.agg() Method.
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
print(df2)
Yields below output.
Fee
Courses Fee
PySpark 25000 25000
26000 26000
Python 24000 24000
Spark 22000 22000
23000 23000
Now, you can calculate the percentage in a simpler way just groupby
the Courses
and divide Fee
column by its sum
by lambda
function and DataFrame.apply()
method. Here df2
is a Series of Multi Index with one column where values are all numeric. After you have done groupby
, each x
is a subset of that column and level=0
means you are grouping by the first level of the index
, rather than by one of the columns.
# Percentage by lambda and DataFrame.apply() method.
df3 = df2.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))
print(df3)
Yields below output.
Courses Fee
PySpark 25000 49.019608
26000 50.980392
Python 24000 100.000000
Spark 22000 48.888889
23000 51.111111
Another method to calculate total percentage with groupby
by using DataFrame.div()
method. Here div
tells pandas to join the DataFrame based on the values in the Courses
level of the index
.
# Using DataFrame.div() method.
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
Courses = df.groupby(['Courses']).agg({'Fee': 'sum'})
df3=df2.div(Courses, level='Courses') * 100
print(df3)
Yields output same as above.
3. Using groupby with DataFrame.transform() Method
You can caluclate pandas percentage with total by groupby() and DataFrame.transform()
method. The transform()
method allows you to execute a function for each value of the DataFrame. Here, the percentage directly summarized DataFrame, then the results will be calculated using all the data.
# Using DataFrame.transform() method.
df['%'] = 100 * df['Fee'] / df.groupby('Courses')['Fee'].transform('sum')
print(df)
Yields below output.
Courses Fee Duration %
0 Spark 22000 30days 48.888889
1 PySpark 25000 50days 49.019608
2 Spark 23000 30days 51.111111
3 Python 24000 60days 100.000000
4 PySpark 26000 35days 50.980392
Alternatively, you can also caluclate percentage of total by using DataFrame.transform()
method with lambda
functions in which you can add the percentages as a new column, leaving the rest of the DataFrame untouched.
# Alternative method of DataFrame.transform() by lambda functions.
df['Courses_Fee'] = df.groupby(['Courses'])['Fee'].transform(lambda x: x/x.sum())
print(df)
Yields Output same as above.
4. Other Example-
You can calculate pandas percentage of total by using groupby using lambda
function.
# Caluclate groupby with DataFrame.rename() and DataFrame.transform() with lambda functions.
df2=df.groupby(['Courses', 'Fee'])['Fee'].sum().rename("Courses_fee").groupby(level = 0).transform(lambda x: x/x.sum())
print(df2)
Yields below output.
Courses Fee
PySpark 25000 0.490196
26000 0.509804
Python 24000 1.000000
Spark 22000 0.488889
23000 0.511111
Name: Courses_fee, dtype: float64
6. Complete Examples to Caluclate Percentage with Groupby
Below are Complete examples to caluclate percentage with groupby of pandas DataFrame.
# Below are complete examples.
# Create a Pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Spark","Python","PySpark"],
'Fee' :[22000,25000,23000,24000,26000],
'Duration':['30days','50days','30days', None,np.nan]
}
df = pd.DataFrame(technologies)
print(df)
# Using DataFrame.agg() Method.
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
print(df2)
# Percentage by lambda and DataFrame.apply() method.
df3 = df2.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))
print(df3)
# Using DataFrame.div() method.
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
Courses = df.groupby(['Courses']).agg({'Fee': 'sum'})
df2.div(Courses, level='Courses') * 100
print(df2)
# Using DataFrame.transform() method.
df['%'] = 100 * df['Fee'] / df.groupby('Courses')['Fee'].transform('sum')
print(df)
# Alternative method of DataFrame.transform() by lambda functions.
df['Courses_Fee'] = df.groupby(['Courses'])['Fee'].transform(lambda x: x/x.sum())
print(df)
# Caluclate groupby with DataFrame.rename() and DataFrame.transform() with lambda functions.
df2=df.groupby(['Courses', 'Fee'])['Fee'].sum().rename("Courses_fee").groupby(level = 0).transform(lambda x: x/x.sum())
print(df2)
Conclusion
In this article, You have learned how to calculate percentage with groupby of pandas DataFrame by using DataFrame.groupby()
, DataFrame.agg()
, DataFrame.transform()
and DataFrame.apply()
methods with lambda
function. You can also calculate by sum
and divide
functions with examples.
Related Articles
- Add Constant Column to Pandas DataFrame
- Rename Index Values of Pandas DataFrame
- How to Print Pandas DataFrame without Index
- Retrieve Number of Columns From Pandas DataFrame
- Pandas Find Unique Values From Columns
- How to Use NOT IN Filter in Pandas
- Pandas Merge Multiple DataFrames
- Pandas Left Join Explained By Examples
- Differences between Pandas Join vs Merge