• Post author:
  • Post category:Pandas
  • Post last modified:December 2, 2024
  • Reading time:17 mins read
You are currently viewing Pandas Percentage Total With Groupby

You can calculate the percentage of the total within each group using DataFrame.groupby() along with DataFrame.agg(), DataFrame.transform(), and DataFrame.apply() with lambda function. You can also calculate the percentage by using sum and divide functions.

Advertisements

In this article, You can find out how to calculate the percentage total of pandas DataFrame with some examples.

Key Points –

  • Calculating percentage totals with groupby is useful for understanding the relative distribution of values within each group in a DataFrame.
  • Use aggregation methods (like sum(), count(), or mean()) to calculate the total values within each group.
  • To calculate percentages, divide the aggregated value of each group by the total of the column or group, then multiply by 100.
  • After grouping, you can calculate percentages for individual columns by dividing the group’s total by the overall total and multiplying by 100.
  • The apply() function allows custom lambda functions for more complex percentage calculations after grouping.
  • Using methods like div() or transform() is more efficient than using loops, especially for large datasets.

Quick Examples of Pandas Percentage Total by Groupby

If you are in a hurry below are some quick examples of calculating the percentage total of Pandas DataFrame.


# Quick examples of pandas percentage total by Groupby

# EXample 1: Using DataFrame.agg() method
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})

# EXample 2: Percentage by lambda and DataFrame.apply() method
df3 = df2.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))

# EXample 3: Using DataFrame.div() method
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
Courses = df.groupby(['Courses']).agg({'Fee': 'sum'})
df2.div(Courses, level='Courses') * 100

# EXample 4: Using groupby with DataFrame.rename() method
df2= df.groupby(['Courses', 'Fee'])['Fee'].sum().rename("count")

# EXample 5: Using DataFrame.transform() method
df['%'] = 100 * df['Fee'] / df.groupby('Courses')['Fee'].transform('sum')

# EXample 6: Alternative method of DataFrame.transform() by lambda functions
df['Courses_Fee'] = df.groupby(['Courses'])['Fee'].transform(lambda x: x/x.sum())

# EXample 7: Caluclate groupby with DataFrame.rename() and DataFrame.transform() with lambda functions
df2=df.groupby(['Courses', 'Fee'])['Fee'].sum().rename("Courses_fee").groupby(level = 0).transform(lambda x: x/x.sum())

Now, Let’s create a Pandas DataFrame with a few rows and columns, execute these examples, and validate the results that calculate the percentage total of the Pandas DataFrame.


# Create a Pandas DataFrame
import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Spark","Python","PySpark"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Duration':['30days','50days','30days', None,np.nan]
          }
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)

Yields below output.

pandas percentage total groupby

Pandas Calculate percentage with Groupby With .agg() Method

You can calculate the percentage of the total within each group using the DataFrame.groupby() method along with agg() function. You can use the groupby() method on the DataFrame df to group it by the columns 'Courses' and 'Fee'. Then, you can apply the agg(aggregate) function to perform an aggregation operation on the grouped data.

Let’s calculate the percentage of the total “Fee” within each "course" group.


# Using DataFrame.agg() method
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
print("Get percentage of the total Fee within each course group:\n", df2)

Yields below output.

pandas percentage total groupby

You can use the groupby() method on the DataFrame df2 to group it by the first level of the index (level 0). Then, you can apply a lambda function using the apply() method to calculate the percentage of the total for each group.

After grouping, the apply() method is used to apply a lambda function to each group. The lambda function calculates the percentage of the total for each group. It takes each value (x) in the group, multiplying it by 100, and then dividing by the sum of all values in the group.


# Percentage by lambda and DataFrame.apply() method
df3 = df2.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))
print(df3)

Yields below output.


# Output:
Courses Fee              
PySpark 25000   49.019608
        26000   50.980392
Python  24000  100.000000
Spark   22000   48.888889
        23000   51.111111

Another method is to calculate the percentage of the total for each group using DataFrame.div() method. Here div tells pandas to join the DataFrame based on the values at the Courses level of the index.


# Using DataFrame.div() method
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
Courses = df.groupby(['Courses']).agg({'Fee': 'sum'})
df3=df2.div(Courses, level='Courses') * 100
print(df3)

Yields output same as above.

Using groupby with DataFrame.transform() Method

You can also calculate the total percentage within each group using groupby() along with DataFrame.transform() method. The transform() method allows you to execute a function for each value of the DataFrame. Here, the percentage is directly summarized in DataFrame, then the results will be calculated using all the data.


# Using DataFrame.transform() method
df['%'] = 100 * df['Fee'] / df.groupby('Courses')['Fee'].transform('sum')
print(df)

Yields below output.


# Output:
   Courses    Fee Duration           %
0    Spark  22000   30days   48.888889
1  PySpark  25000   50days   49.019608
2    Spark  23000   30days   51.111111
3   Python  24000   60days  100.000000
4  PySpark  26000   35days   50.980392

Alternatively, you can also calculate the percentage of the total within each group by using DataFrame.transform() method with lambda functions in which you can add the percentages as a new column, leaving the rest of the DataFrame untouched.


# Alternative method of DataFrame.transform() by lambda functions
df['Courses_Fee'] = df.groupby(['Courses'])['Fee'].transform(lambda x: x/x.sum())
print(df)

Yields Output same as above.

Other Example-

You can calculate the percentage of the total of each Pandas group by using groupby along with lambda function.


# Caluclate groupby with DataFrame.rename() and DataFrame.transform() with lambda functions.
df2=df.groupby(['Courses', 'Fee'])['Fee'].sum().rename("Courses_fee").groupby(level = 0).transform(lambda x: x/x.sum())
print(df2)

Yields below output.


# Output:
Courses  Fee  
PySpark  25000    0.490196
         26000    0.509804
Python   24000    1.000000
Spark    22000    0.488889
         23000    0.511111
Name: Courses_fee, dtype: float64

Complete Examples to Calculate Percentage with Groupby

Below are Complete examples to calculate percentages with groupby of pandas DataFrame.


# Below are complete examples

# Create a Pandas DataFrame
import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Spark","Python","PySpark"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Duration':['30days','50days','30days', None,np.nan]
          }
df = pd.DataFrame(technologies)
print(df)

# Using DataFrame.agg() Method
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
print(df2)

# Percentage by lambda and DataFrame.apply() method
df3 = df2.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))
print(df3)

# Using DataFrame.div() method.
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
Courses = df.groupby(['Courses']).agg({'Fee': 'sum'})
df2.div(Courses, level='Courses') * 100
print(df2)

# Using DataFrame.transform() method
df['%'] = 100 * df['Fee'] / df.groupby('Courses')['Fee'].transform('sum')
print(df)

# Alternative method of DataFrame.transform() by lambda functions
df['Courses_Fee'] = df.groupby(['Courses'])['Fee'].transform(lambda x: x/x.sum())
print(df)

# Caluclate groupby with DataFrame.rename() and DataFrame.transform() with lambda functions
df2=df.groupby(['Courses', 'Fee'])['Fee'].sum().rename("Courses_fee").groupby(level = 0).transform(lambda x: x/x.sum())
print(df2)

FAQ on Pandas Percentage Total With Groupby

What is the groupby function in Pandas, and how can it help in calculating percentages?

The groupby function in Pandas allows you to group rows of a DataFrame based on one or more columns. Once grouped, you can apply aggregation functions such as sum, mean, or count. To calculate percentages, you can use groupby to get the total per group and then compute individual values as a percentage of the total.

How do I calculate the percentage of a column’s total for each group?

To calculate the percentage of a column’s total for each group in a Pandas DataFrame, you can use the groupby function in combination with transform to compute the percentage of the total within each group.

Can I calculate the percentage of the total for all groups combined?

You can calculate the percentage of the total for all groups combined by using the total sum of the entire column (across all groups) and then calculating each group’s value as a percentage of this grand total.

How can I compute percentages using agg() function?

You can compute percentages using the agg() function by first aggregating the data and then calculating the percentage within the aggregation results. The agg() function allows you to apply multiple aggregation functions at once and can be customized to calculate the sum of a column and then compute the percentage for each group.

What if I want to calculate the percentage within multiple columns?

If you want to calculate the percentage within multiple columns, you can use groupby along with agg() to perform aggregations on multiple columns and then compute percentages based on the sums or other aggregation results.

Conclusion

In this article, You have learned how to calculate the percentage of the total within each Pandas group by using DataFrame.groupby() function along with DataFrame.agg(), DataFrame.transform() and DataFrame.apply() methods with lambda function.

References