How to group by mean in R? By using aggregate()
from R base or group_by()
function along with the summarise()
from the dplyr package you can do the group by on dataframe on a specific column and get the average/mean of a column for each group. The mean is the sum of all values of a column divided by the number of values. It is also sometimes referred to as average.
Using the group_by() function from the dplyr package is an efficient approach hence, I will cover this first and then use the aggregate() function from the R base to group by mean on single and multiple columns.
1. Quick Examples
Following are quick examples of how to perform group by mean/average.
# Group by mean using dplyr
agg_tbl <- df %>% group_by(department) %>%
summarise(mean_salary=mean(salary),
.groups = 'drop')
# Convert tibble to df
df2 <- agg_tbl %>% as.data.frame()
# Group by mean of multiple columns
df2 <- df %>% group_by(department,state) %>%
summarise(mean_salary=mean(salary),
mean_bonus= mean(bonus),
.groups = 'drop') %>%
as.data.frame()
# Group by mean of multiple columns
df2 <- df %>% group_by(department,state) %>%
summarise(across(c(salary, bonus),mean),
.groups = 'drop') %>%
as.data.frame()
# Mean on all columns
num_df<- df[,c("department","state","age","salary","bonus")]
df2 <- num_df %>% group_by(department, state) %>%
summarise(across(everything(), mean),
.groups = 'drop') %>%
as.data.frame()
# Group by mean using R Base aggregate()
agg_df <- aggregate(df$salary, by=list(df$department), FUN=mean)
# R Base aggregate() on multiple columns
agg_df <- aggregate(df$salary, by=list(df$department,df$state), FUN=mean)
Let’s create a DataFrame by reading a CSV file.
# Read CSV file into DataFrame
df = read.csv('/Users/admin/apps/github/r-examples/resources/emp.csv')
df
Yields below output.

2. Group By Mean in R using dplyr
You can use group_by()
function along with the summarise()
from dplyr package to find the group by mean/average in R DataFrame, group_by()
returns the grouped_df
( A grouped Data Frame) and use summarise() on grouped df results to get the group by sum. Mean is the average of the given sample or data set, it is equal to the total of observations of a column divided by the number of observations.
To use these functions first, you have to install dplyr first using install.packages(‘dplyr’) and load it using library(dplyr)
. I will use dplyr infix operator %>%
across all our examples as the result of group_by() function goes as input to summarise() function.
# Load dplyr
library(dplyr)
# Group by mean using dplyr
agg_tbl <- df %>% group_by(department) %>%
summarise(mean_salary=mean(salary),
.groups = 'drop')
agg_tbl
# Convert tibble to df
df2 <- agg_tbl %>% as.data.frame()
df2
Yields below output. The above example does the group by on department
column using group_by()
and gets the mean of salary
for each department using summarise()
.
Note that group_by()
and summarise()
function returns tibble, if you want DataFrame you should convert tibble to dataframe by using as.data.frame()
.

3. Group By Mean of Multiple Columns in R
By using the dplyr group_by() perform group on department and state columns (multiple columns) and get the mean of salary
and bonus
for each department & state combination.
# Group by mean of multiple columns
df2 <- df %>% group_by(department,state) %>%
summarise(mean_salary=mean(salary),
mean_bonus= mean(bonus),
.groups = 'drop') %>%
as.data.frame()
df2
Yields below output.

You can also use across() with the vector of elements you wanted to apply summarise on.
# Group by mean of multiple columns
df2 <- df %>% group_by(department,state) %>%
summarise(across(c(salary, bonus),mean),
.groups = 'drop') %>%
as.data.frame()
df2
4. Mean on All Columns Except Group by Columns
Finally, let’s see how to apply the groupby and aggregate function mean on all columns of the DataFrame except grouping columns. While doing this make sure your dataframe has only numeric columns plus grouping columns. Having non-numeric on summarise returns an error.
This example does the group by on department
and state
columns, summarises on all columns except grouping columns, and apply the mean
function on all summarised columns.
# Mean on all columns
num_df<- df[,c("department","state","age","salary","bonus")]
df2 <- num_df %>% group_by(department, state) %>%
summarise(across(everything(), mean),
.groups = 'drop') %>%
as.data.frame()
df2
Yields below output.

4. Group By Mean using R base aggregate()
R base provides an aggregate()
function to perform the grouping on the dataframe, let’s use this to perform a groupby on the department
column and get the mean of salary
for each department.
# Group by mean using R Base aggregate()
agg_df <- aggregate(df$salary, by=list(df$department), FUN=mean)
agg_df
Yields below output.

5. R Base aggregate() on Multiple Columns
The following example also uses the aggregate()
function to group rows based on department
and state
columns and uses the mean
function to get the average of salary
for each department & state combination.
# R Base aggregate() on multiple columns
agg_df <- aggregate(df$salary, by=list(df$department,df$state), FUN=mean)
agg_df
Yields below output.

Conclusion
In this article, I have explained how to group by mean or average in R by using group_by() function from the dplyr package and aggregate() function from the R base. Between these two, dplyr functions perform efficiently when you are dealing with larger datasets.
Related Articles
- R Filter DataFrame by Column Value
- R select() Function from dplyr – Usage with Examples
- Different Ways to Create a DataFrame in R
- R Group by Multiple Columns or Variables
- R Group by Count With Examples
- R Group by Sum With Examples
- R Summarise on Group By in Dplyr
- R lm() Function – Fitting Linear Models
References
- https://www.rdocumentation.org/packages/dplyr/versions/0.7.8/topics/grouped_df
- https://www.w3schools.com/sql/sql_groupby.asp