You are currently viewing R Group by Count With Examples

To perform a group-by operation to count occurrences in R, you can use either the aggregate() function from base R or a combination of group_by() and summarise() from the dplyr package. This allows the grouping of rows in a data frame based on a specific column and then counts the number of rows in each group.

Advertisements

First, I will cover the usage of the group_by() function from the dplyr package, which is an efficient approach. Then, I will demonstrate using the aggregate() function from the R base.

Quick Examples

Here are some simple examples demonstrating how to perform a group-by-count.


# Load dplyr
library(dplyr)

# Group by count using dplyr
agg_tbl <- df %>% group_by(department) %>% 
  summarise(total_count=n(),
            .groups = 'drop')

# Convert tibble to df
df2 <- agg_tbl %>% as.data.frame()

# Group by count of multiple columns
df2 <- df %>% group_by(department,state) %>% 
  summarise(total_count=n(),.groups = 'drop') %>% 
  as.data.frame()

# Group by count using R Base aggregate()
agg_df <- aggregate(df$state, by=list(df$department), FUN=length)

# R Base aggregate() on multiple columns
agg_df <- aggregate(df$state, by=list(df$department,df$state), FUN=length)

Let’s build a data frame by loading a CSV file.


# Read CSV file into DataFrame
df = read.csv('/Users/admin/apps/github/r-examples/resources/emp.csv')
df

Yields below output.

r group by count

Grouping and Counting in R with dplyr

To perform group-by operations in R data frames, you can use group_by() from the dplyr package, followed by summarise() to get counts for each group. The group_by() function returns grouped data, and then you can apply summarise() on this grouped data to compute the count.

Before using these functions, make sure to install dplyr with install.packages(‘dplyr’) and load it using library(dplyr). In the examples, I will use the dplyr infix operator %>% to chain functions, allowing group_by() to be used as an input to summarise().


# Load dplyr
library(dplyr)

# Group by count using dplyr
agg_tbl <- df %>% group_by(department) %>% 
  summarise(total_count=n(),
            .groups = 'drop')
agg_tbl

# Convert tibble to df
df2 <- agg_tbl %>% as.data.frame()
df2

The code snippet below demonstrates how to group data by the department column and count the number of entries for each department.

Please keep in mind that the group_by() and summarise() functions yield a tibble. If you would prefer a data frame, you can convert the tibble into a dataframe using as.data.frame().

r group by dataframe count

Counting Rows Based on Grouped Columns in R

This example groups the data by the department and state columns then find the count of occurrences for each unique department and state combination.


# Group by count of multiple columns
df2 <- df %>% group_by(department,state) %>% 
  summarise(total_count=n(),.groups = 'drop') %>%
  as.data.frame()
df2 

Yields below output.

r group by count multiple columns

Grouping and Counting using R base aggregate()

R base provides an aggregate() function to perform the grouping on the dataframe, let’s use this to perform a groupby on the department column and get the count for each department.


R base package has the aggregate() function, which allows you to group data in a data frame. You can apply this function on a given data frame to group the data based on a specific column and calculate the count for each unique value of that column.


# Group by count using R Base aggregate()
agg_df <- aggregate(df$state, by=list(df$department), FUN=length)
agg_df

Yields below output.

r group by count aggregate

Applying aggregate() to Multiple Columns

Alternatively, you can use the aggregate() function to group the data according to multiple columns. Then apply the length()function on grouped data to get the count for each unique combination of those columns.


# R Base aggregate() on multiple columns 
agg_df <- aggregate(df$state, by=list(df$department,df$state), FUN=length)
agg_df 

The above code groups rows according to the department and state columns, then use the length() function to count the number of occurrences for each unique combination of department and state.

Yields below output.

r aggregate multiple columns

Conclusion

In this article, I have discussed how to perform group by count in R using the group_by() function from the dplyr package and the base R aggregate() function. When working with larger datasets, dplyr functions are generally more efficient.

References