You are currently viewing R Group by Sum With Examples

How to do group by sum in R? By using aggregate() from R base or group_by() function along with the summarise() from the dplyr package, you can do the group by on dataframe on a specific column and get the sum of a column for each group.

Advertisements

Using the groupby() function from the dplyr package is an efficient approach hence, I will cover this first and then use the aggregate() function from the R base to group by sum on single and multiple columns.

1. Quick Examples

Following are quick examples of how to perform group by sum.


# Create Data Frame
df = read.csv('/Users/admin/apps/github/r-examples/resources/emp.csv')
df

# Load dplyr
library(dplyr)

# Group by sum using dplyr
# returns tibble table
agg_tbl <- df %>% group_by(department) %>% 
  summarise(sum_salary=sum(salary),
            .groups = 'drop')

# Convert tibble to df
df2 <- agg_tbl %>% as.data.frame()

# Group by sum of multiple columns
df2 <- df %>% group_by(department,state) %>% 
  summarise(sum_salary=sum(salary),
            sum_bonus= sum(bonus),
            .groups = 'drop') %>%
  as.data.frame()

# Group by sum of multiple columns
df2 <- df %>% group_by(department,state) %>% 
  summarise(across(c(salary, bonus),sum),
            .groups = 'drop') %>%
  as.data.frame()

# Sum on all columns
num_df<- df[,c("department","state","age","salary","bonus")]
df2 <- num_df %>% group_by(department, state) %>% 
  summarise(across(everything(), sum),
            .groups = 'drop')  %>%
  as.data.frame()

# Group by sum using R Base aggregate()
agg_df <- aggregate(df$salary, by=list(df$department), FUN=sum)

# R Base aggregate() on multiple columns
agg_df <- aggregate(df$salary, by=list(df$department,df$state), FUN=sum)

Let’s create a DataFrame by reading a CSV file.


# Read CSV file into DataFrame
df = read.csv('/Users/admin/apps/github/r-examples/resources/emp.csv')
df

Yields below output.

r group by sum

2. Group By Sum in R using dplyr

You can use group_by() function along with the summarise() from dplyr package to find the group by sum in R DataFrame, group_by() returns the grouped_df ( A grouped Data Frame) and use summarise() on grouped df results to get the group by sum.

To use these functions first, you have to install dplyr first using install.packages(‘dplyr’) and load it using library(dplyr). I will use dplyr infix operator %>% across all our examples as the result of group_by() function goes as input to summarise() function.


# Load dplyr
library(dplyr)

# Group by sum using dplyr
agg_tbl <- df %>% group_by(department) %>% 
  summarise(sum_salary = sum(salary),
            .groups = 'drop')
agg_tbl

# Convert tibble to df
df2 <- agg_tbl %>% as.data.frame()
df2

Yields below output. The above example does the group by on department column using group_by() and gets the sum of salary for each department using summarise().

Note that group_by() and summarise() function returns tibble, if you want DataFrame you should convert tibble to dataframe by using as.data.frame().

r group by sum

3. Group By Sum of Multiple Columns in R

By using the dplyr group_by() perform group on department and state columns (multiple columns) and get the sum of salary and bonus for each department & state combination.


# Group by sum of multiple columns
df2 <- df %>% group_by(department,state) %>% 
  summarise(sum_salary=sum(salary),
            sum_bonus= sum(bonus),
            .groups = 'drop') %>%
  as.data.frame()
df2

Yields below output.

r group by sum multiple columns

You can also use across() with the vector of elements you wanted to apply summarise on.


# Group by sum of multiple columns using across()
df2 <- df %>% group_by(department,state) %>% 
  summarise(across(c(salary, bonus),sum),
            .groups = 'drop') %>%
  as.data.frame()
df2

4. Sum on All Columns Except Group by Columns

Finally, let’s see how to apply the aggregate function sum on all columns of the DataFrame except grouping columns. While doing this make sure your dataframe has only numeric columns plus grouping columns. Having non-numeric on summarise returns an error.

This example does the group by on department and state columns, summarises on all columns except grouping columns, and apply the sum function on all summarised columns.


# Sum on all columns except group by columns
num_df<- df[,c("department","state","age","salary","bonus")]
df2 <- num_df %>% group_by(department, state) %>% 
  summarise(across(everything(), sum),
            .groups = 'drop')  %>%
  as.data.frame()
df2

on all columns

4. Group By Sum using R base aggregate()

R base provides an aggregate() function to perform the grouping on the dataframe, let’s use this to perform a groupby on the department column and get the sum of salary for each department.


# Group by sum using R Base aggregate()
agg_df <- aggregate(df$salary, by=list(df$department), FUN=sum)
agg_df

Yields below output.

r aggregate sum

5. R Base aggregate() on Multiple Columns

The following example also uses the aggregate() function to group rows based on department and state columns and uses the sum function to get the sum of salary for each department & state combination.


# R Base aggregate() on multiple columns
agg_df <- aggregate(df$salary, by=list(df$department,df$state), FUN=sum)
agg_df

Yields below output.

multiple columns

Conclusion

In this article, I have explained how to do group by sum in R by using group_by() function from the dplyr package and aggregate function from the R base. Between these two, dplyr functions perform efficiently when you are dealing with larger datasets.

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium