Pandas Percentage Total With Groupby

Spread the love

You can calculate the percentage of total with the groupby of pandas DataFrame by using DataFrame.groupby(), DataFrame.agg(), DataFrame.transform() methods and DataFrame.apply() with lambda function. You can also calculate percentage by sum and divide functions.

In this article, You can find out how to calculate the percentage total of pandas DataFrame with some below examples.

1. Quick Examples of Pandas Percentage Total by Groupby

If, You are in hurry below are some quick examples to calculate percentage total of Pandas DataFrame.


# Below are some quick examples.
# Using DataFrame.agg() Method.
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})

# Percentage by lambda and DataFrame.apply() method.
df3 = df2.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))

# Using DataFrame.div() method.
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
Courses = df.groupby(['Courses']).agg({'Fee': 'sum'})
df2.div(Courses, level='Courses') * 100

# Using groupby with DataFrame.rename() Method.
df2= df.groupby(['Courses', 'Fee'])['Fee'].sum().rename("count")

# Using DataFrame.transform() method.
df['%'] = 100 * df['Fee'] / df.groupby('Courses')['Fee'].transform('sum')

# Alternative method of DataFrame.transform() by lambda functions.
df['Courses_Fee'] = df.groupby(['Courses'])['Fee'].transform(lambda x: x/x.sum())

# Caluclate groupby with DataFrame.rename() and DataFrame.transform() with lambda functions.
df2=df.groupby(['Courses', 'Fee'])['Fee'].sum().rename("Courses_fee").groupby(level = 0).transform(lambda x: x/x.sum())

Now, Let’s create a pandas DataFrame with a few rows and columns, execute these examples and validate results that calculate the percentage total of pandas DataFrame.


# Create a Pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Spark","Python","PySpark"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Duration':['30days','50days','30days', None,np.nan]
          }
df = pd.DataFrame(technologies)
print(df)

Yields below output.


   Courses    Fee Duration
0    Spark  22000   30days
1  PySpark  25000   50days
2    Spark  23000   30days
3   Python  24000   60days
4  PySpark  26000   35days

2. Pandas Calculate percentage with Groupby With .agg() Method

You can calculate the percentage by using DataFrame.groupby() method. It is a process involving one or more of the following steps.

  • Splitting the data into groups based on some criteria.
  • Applying a function to each group independently.
  • Combining the results into a data structure.

The DataFrame.agg() method is used to pass a function or list of functions to be applied on a series or even each element of series separately. In the list of functions, multiple results are returned by DataFrame.agg() method.


# Using DataFrame.agg() Method.
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
print(df2)

Yields below output.


                 Fee
Courses Fee         
PySpark 25000  25000
        26000  26000
Python  24000  24000
Spark   22000  22000
        23000  23000

Now, you can calculate the percentage in a simpler way just groupby the Courses and divide Fee column by its sum by lambda function and DataFrame.apply() method. Here df2 is a Series of Multi Index with one column where values are all numeric. After you have done groupby, each x is a subset of that column and level=0 means you are grouping by the first level of the index, rather than by one of the columns.


# Percentage by lambda and DataFrame.apply() method.
df3 = df2.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))
print(df3)

Yields below output.


Courses Fee              
PySpark 25000   49.019608
        26000   50.980392
Python  24000  100.000000
Spark   22000   48.888889
        23000   51.111111

Another method to calculate total percentage with groupby by using DataFrame.div() method. Here div tells pandas to join the DataFrame based on the values in the Courses level of the index.


# Using DataFrame.div() method.
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
Courses = df.groupby(['Courses']).agg({'Fee': 'sum'})
df3=df2.div(Courses, level='Courses') * 100
print(df3)

Yields output same as above.

3. Using groupby with DataFrame.transform() Method

You can caluclate pandas percentage with total by groupby() and DataFrame.transform() method. The transform() method allows you to execute a function for each value of the DataFrame. Here, the percentage directly summarized DataFrame, then the results will be calculated using all the data.


# Using DataFrame.transform() method.
df['%'] = 100 * df['Fee'] / df.groupby('Courses')['Fee'].transform('sum')
print(df)

Yields below output.


   Courses    Fee Duration           %
0    Spark  22000   30days   48.888889
1  PySpark  25000   50days   49.019608
2    Spark  23000   30days   51.111111
3   Python  24000   60days  100.000000
4  PySpark  26000   35days   50.980392

Alternatively, you can also caluclate percentage of total by using DataFrame.transform() method with lambda functions in which you can add the percentages as a new column, leaving the rest of the DataFrame untouched.


# Alternative method of DataFrame.transform() by lambda functions.
df['Courses_Fee'] = df.groupby(['Courses'])['Fee'].transform(lambda x: x/x.sum())
print(df)

Yields Output same as above.

4. Other Example-

You can calculate pandas percentage of total by using groupby using lambda function.


# Caluclate groupby with DataFrame.rename() and DataFrame.transform() with lambda functions.
df2=df.groupby(['Courses', 'Fee'])['Fee'].sum().rename("Courses_fee").groupby(level = 0).transform(lambda x: x/x.sum())
print(df2)

Yields below output.


Courses  Fee  
PySpark  25000    0.490196
         26000    0.509804
Python   24000    1.000000
Spark    22000    0.488889
         23000    0.511111
Name: Courses_fee, dtype: float64

6. Complete Examples to Caluclate Percentage with Groupby

Below are Complete examples to caluclate percentage with groupby of pandas DataFrame.


# Below are complete examples.
# Create a Pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Spark","Python","PySpark"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Duration':['30days','50days','30days', None,np.nan]
          }
df = pd.DataFrame(technologies)
print(df)

# Using DataFrame.agg() Method.
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
print(df2)

# Percentage by lambda and DataFrame.apply() method.
df3 = df2.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))
print(df3)

# Using DataFrame.div() method.
df2 = df.groupby(['Courses', 'Fee']).agg({'Fee': 'sum'})
Courses = df.groupby(['Courses']).agg({'Fee': 'sum'})
df2.div(Courses, level='Courses') * 100
print(df2)

# Using DataFrame.transform() method.
df['%'] = 100 * df['Fee'] / df.groupby('Courses')['Fee'].transform('sum')
print(df)

# Alternative method of DataFrame.transform() by lambda functions.
df['Courses_Fee'] = df.groupby(['Courses'])['Fee'].transform(lambda x: x/x.sum())
print(df)

# Caluclate groupby with DataFrame.rename() and DataFrame.transform() with lambda functions.
df2=df.groupby(['Courses', 'Fee'])['Fee'].sum().rename("Courses_fee").groupby(level = 0).transform(lambda x: x/x.sum())
print(df2)

Conclusion

In this article, You have learned how to calculate percentage with groupby of pandas DataFrame by using DataFrame.groupby(), DataFrame.agg(), DataFrame.transform() and DataFrame.apply() methods with lambda function. You can also calculate by sum and divide functions with examples.

References

Leave a Reply

You are currently viewing Pandas Percentage Total With Groupby