• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:11 mins read
You are currently viewing Pandas Percentage Total With Groupby

You can calculate the percentage of the total within each group using DataFrame.groupby() along with DataFrame.agg(), DataFrame.transform(), and DataFrame.apply() with lambda function. You can also calculate the percentage by using sum and divide functions.

In this article, You can find out how to calculate the percentage total of pandas DataFrame with some examples.

1. Quick Examples of Pandas Percentage Total by Groupby

If you are in a hurry below are some quick examples of calculating the percentage total of Pandas DataFrame.


# Below are some quick examples.

# EXample 1: Using DataFrame.agg() Method.
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})

# EXample 2: Percentage by lambda and DataFrame.apply() method.
df3 = df2.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))

# EXample 3: Using DataFrame.div() method.
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
Courses = df.groupby(['Courses']).agg({'Fee': 'sum'})
df2.div(Courses, level='Courses') * 100

# EXample 4: Using groupby with DataFrame.rename() Method.
df2= df.groupby(['Courses', 'Fee'])['Fee'].sum().rename("count")

# EXample 5: Using DataFrame.transform() method.
df['%'] = 100 * df['Fee'] / df.groupby('Courses')['Fee'].transform('sum')

# EXample 6: Alternative method of DataFrame.transform() by lambda functions.
df['Courses_Fee'] = df.groupby(['Courses'])['Fee'].transform(lambda x: x/x.sum())

# EXample 7: Caluclate groupby with DataFrame.rename() and DataFrame.transform() with lambda functions.
df2=df.groupby(['Courses', 'Fee'])['Fee'].sum().rename("Courses_fee").groupby(level = 0).transform(lambda x: x/x.sum())

Now, Let’s create a Pandas DataFrame with a few rows and columns, execute these examples, and validate the results that calculate the percentage total of the Pandas DataFrame.


# Create a Pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Spark","Python","PySpark"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Duration':['30days','50days','30days', None,np.nan]
          }
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)

Yields below output.

pandas percentage total groupby

2. Pandas Calculate percentage with Groupby With .agg() Method

You can calculate the percentage of the total within each group using the DataFrame.groupby() method along with agg() function. You can use the groupby() method on the DataFrame df to group it by the columns 'Courses' and 'Fee'. Then, you can apply the agg(aggregate) function to perform an aggregation operation on the grouped data.

Let’s calculate the percentage of the total “Fee” within each "course" group.


# Using DataFrame.agg() Method.
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
print("Get percentage of the total Fee within each course group:\n", df2)

Yields below output.

pandas percentage total groupby

You can use the groupby() method on the DataFrame df2 to group it by the first level of the index (level 0). Then, you can apply a lambda function using the apply() method to calculate the percentage of the total for each group.

After grouping, the apply() method is used to apply a lambda function to each group. The lambda function calculates the percentage of the total for each group. It takes each value (x) in the group, multiplying it by 100, and then dividing by the sum of all values in the group.


# Percentage by lambda and DataFrame.apply() method.
df3 = df2.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))
print(df3)

Yields below output.


# Output:
Courses Fee              
PySpark 25000   49.019608
        26000   50.980392
Python  24000  100.000000
Spark   22000   48.888889
        23000   51.111111

Another method is to calculate the percentage of the total for each group using DataFrame.div() method. Here div tells pandas to join the DataFrame based on the values at the Courses level of the index.


# Using DataFrame.div() method.
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
Courses = df.groupby(['Courses']).agg({'Fee': 'sum'})
df3=df2.div(Courses, level='Courses') * 100
print(df3)

Yields output same as above.

3. Using groupby with DataFrame.transform() Method

You can also calculate the total percentage within each group using groupby() along with DataFrame.transform() method. The transform() method allows you to execute a function for each value of the DataFrame. Here, the percentage is directly summarized in DataFrame, then the results will be calculated using all the data.


# Using DataFrame.transform() method.
df['%'] = 100 * df['Fee'] / df.groupby('Courses')['Fee'].transform('sum')
print(df)

Yields below output.


# Output:
   Courses    Fee Duration           %
0    Spark  22000   30days   48.888889
1  PySpark  25000   50days   49.019608
2    Spark  23000   30days   51.111111
3   Python  24000   60days  100.000000
4  PySpark  26000   35days   50.980392

Alternatively, you can also calculate the percentage of the total within each group by using DataFrame.transform() method with lambda functions in which you can add the percentages as a new column, leaving the rest of the DataFrame untouched.


# Alternative method of DataFrame.transform() by lambda functions.
df['Courses_Fee'] = df.groupby(['Courses'])['Fee'].transform(lambda x: x/x.sum())
print(df)

Yields Output same as above.

4. Other Example-

You can calculate the percentage of the total of each Pandas group by using groupby along with lambda function.


# Caluclate groupby with DataFrame.rename() and DataFrame.transform() with lambda functions.
df2=df.groupby(['Courses', 'Fee'])['Fee'].sum().rename("Courses_fee").groupby(level = 0).transform(lambda x: x/x.sum())
print(df2)

Yields below output.


# Output:
Courses  Fee  
PySpark  25000    0.490196
         26000    0.509804
Python   24000    1.000000
Spark    22000    0.488889
         23000    0.511111
Name: Courses_fee, dtype: float64

6. Complete Examples to Calculate Percentage with Groupby

Below are Complete examples to calculate percentages with groupby of pandas DataFrame.


# Below are complete examples.

# Create a Pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Spark","Python","PySpark"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Duration':['30days','50days','30days', None,np.nan]
          }
df = pd.DataFrame(technologies)
print(df)

# Using DataFrame.agg() Method.
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
print(df2)

# Percentage by lambda and DataFrame.apply() method.
df3 = df2.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))
print(df3)

# Using DataFrame.div() method.
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
Courses = df.groupby(['Courses']).agg({'Fee': 'sum'})
df2.div(Courses, level='Courses') * 100
print(df2)

# Using DataFrame.transform() method.
df['%'] = 100 * df['Fee'] / df.groupby('Courses')['Fee'].transform('sum')
print(df)

# Alternative method of DataFrame.transform() by lambda functions.
df['Courses_Fee'] = df.groupby(['Courses'])['Fee'].transform(lambda x: x/x.sum())
print(df)

# Caluclate groupby with DataFrame.rename() and DataFrame.transform() with lambda functions.
df2=df.groupby(['Courses', 'Fee'])['Fee'].sum().rename("Courses_fee").groupby(level = 0).transform(lambda x: x/x.sum())
print(df2)

Conclusion

In this article, You have learned how to calculate the percentage of the total within each Pandas group by using DataFrame.groupby() function along with DataFrame.agg(), DataFrame.transform() and DataFrame.apply() methods with lambda function.

References