To group by mean in R, you can use either the `aggregate()` function from base R or the `group_by()` and `summarise()` functions from the dplyr package. These methods allow you to group data in a data frame by a specific column and then compute the mean for each group in another column. The mean is determined by dividing the sum of all values in a column by the total number of values. It is also known as the average.

The `group_by()` function from the dplyr package is a highly efficient method for grouping data, so I will explain it first. Then, I will move on to using the `aggregate()` function from base R to demonstrate how to group by mean on both single and multiple columns.

## 1. Quick Examples

Following are quick examples of how to perform group by mean/average.

``````
# Group by mean using dplyr
agg_tbl <- df %>% group_by(department) %>%
summarise(mean_salary=mean(salary),
.groups = 'drop')

# Convert tibble to df
df2 <- agg_tbl %>% as.data.frame()

# Group by mean of multiple columns
df2 <- df %>% group_by(department,state) %>%
summarise(mean_salary=mean(salary),
mean_bonus= mean(bonus),
.groups = 'drop') %>%
as.data.frame()

# Group by mean of multiple columns
df2 <- df %>% group_by(department,state) %>%
summarise(across(c(salary, bonus),mean),
.groups = 'drop') %>%
as.data.frame()

# Mean on all columns
num_df<- df[,c("department","state","age","salary","bonus")]
df2 <- num_df %>% group_by(department, state) %>%
summarise(across(everything(), mean),
.groups = 'drop')  %>%
as.data.frame()

# Group by mean using R Base aggregate()
agg_df <- aggregate(df\$salary, by=list(df\$department), FUN=mean)

# R Base aggregate() on multiple columns
agg_df <- aggregate(df\$salary, by=list(df\$department,df\$state), FUN=mean)
``````
``````
# Read CSV file into DataFrame
df
``````

Yields below output.

## 2. Perform Group By Mean on a Single Column in R

To calculate the group by mean or average in an R data frame, you can use the `group_by()` function in combination with the `summarise()` from the `dplyr` package. The `group_by()` function creates a grouped data frame based on specified single/multiple columns. You can apply the `summarise()` function on grouped data to calculate the mean or average for each group. Mean is the average of the given sample or data set, it is equal to the total of observations of a column divided by the number of observations.

Before going to use these functions, you need to install the `dplyr` package with install.packages(‘dplyr’), then load it into your R environment using `library(dplyr)`. In all our examples, I will use the `dplyr` infix operator `%>%` to pipe the result from the `group_by()` function to the `summarise()` function.

``````
library(dplyr)

# Group by mean using dplyr
agg_tbl <- df %>% group_by(department) %>%
summarise(mean_salary=mean(salary),
.groups = 'drop')
agg_tbl

# Convert tibble to df
df2 <- agg_tbl %>% as.data.frame()
df2
``````

Yields below output. It groups the data by the `department` column using `group_by()`, then calculates the average salary for each department using `summarise()`.

Keep in mind that the `group_by()` and `summarise()` functions return a tibble. If you need a data frame, you should convert the tibble to a data frame with `as.data.frame()`.

## 3. Perform Group By Mean on Multiple Columns in R

Alternatively, you can perform group by mean on multiple columns of the data frame using the group_by() function and the summarise() function. Apply the group_by() function on multiple columns of the data frame, it will return the grouped object based on multiple columns. Then apply the summarize() function on grouped data, it will return the mean for every unique combination of specified multiple columns.

``````
# Group by mean of multiple columns
df2 <- df %>% group_by(department,state) %>%
summarise(mean_salary=mean(salary),
mean_bonus= mean(bonus),
.groups = 'drop') %>%
as.data.frame()
df2
``````

Yields below output.

You can also use `across()` to apply `summarise` to a set of specified elements or columns.

``````
# Group by mean of multiple columns
df2 <- df %>% group_by(department,state) %>%
summarise(across(c(salary, bonus),mean),
.groups = 'drop') %>%
as.data.frame()
df2
``````

## 4. Perform Mean on Non Grouping Columns

Let’s explore how to use the `groupby()` method and the `summarize()` method to get the mean for all columns in a data frame except grouping columns. Make sure your data frame contains only numeric columns and grouping columns. Using non-numeric data `summarize` will return an error.

``````
# Mean on all columns
num_df<- df[,c("department","state","age","salary","bonus")]
df2 <- num_df %>% group_by(department, state) %>%
summarise(across(everything(), mean),
.groups = 'drop')  %>%
as.data.frame()
df2
``````

In the above code, the data frame is grouped by the `department` and `state` columns, then summarize all other columns except the grouping columns, applying the mean() function to these summarized columns.

Yields below output.

## 5. Group By Mean using R base aggregate()

So far, we have learned how to get the mean/average of grouped data using the dplyr package functions. Now we will see how to calculate the mean of grouped data using the R base `aggregate()` function. This function allows you to group a data frame by specific columns and calculate the mean of those specific columns.

``````
# Group by mean using R Base aggregate()
agg_df <- aggregate(df\$salary, by=list(df\$department), FUN=mean)
agg_df
``````

Yields below output.

## 5. R Base aggregate() on Multiple Columns

You can also apply the `aggregate()` function on multiple columns of the data frame to group the data by multiple columns, and then apply the mean function to calculate the average of those columns based on specified criteria.

``````
# R Base aggregate() on multiple columns
agg_df <- aggregate(df\$salary, by=list(df\$department,df\$state), FUN=mean)
agg_df
``````

Yields below output.

## Conclusion

In this article, I have explained how to calculate the group by mean or average for single or multiple columns in a data frame in R, using the `group_by()` function from the `dplyr` package and the `aggregate()` function from base R. When working with larger datasets, the `dplyr` approach tends to be more efficient than base R.