To perform a group-by operation to count occurrences in R, you can use either the `aggregate()`

function from base R or a combination of `group_by()`

and `summarise()`

from the `dplyr`

package. This allows the grouping of rows in a data frame based on a specific column and then counts the number of rows in each group.

First, I will cover the usage of the group_by() function from the dplyr package, which is an efficient approach. Then, I will demonstrate using the aggregate() function from the R base.

## Quick Examples

Here are some simple examples demonstrating how to perform a group-by-count.

```
# Load dplyr
library(dplyr)
# Group by count using dplyr
agg_tbl <- df %>% group_by(department) %>%
summarise(total_count=n(),
.groups = 'drop')
# Convert tibble to df
df2 <- agg_tbl %>% as.data.frame()
# Group by count of multiple columns
df2 <- df %>% group_by(department,state) %>%
summarise(total_count=n(),.groups = 'drop') %>%
as.data.frame()
# Group by count using R Base aggregate()
agg_df <- aggregate(df$state, by=list(df$department), FUN=length)
# R Base aggregate() on multiple columns
agg_df <- aggregate(df$state, by=list(df$department,df$state), FUN=length)
```

Let’s build a data frame by loading a CSV file.

```
# Read CSV file into DataFrame
df = read.csv('/Users/admin/apps/github/r-examples/resources/emp.csv')
df
```

Yields below output.

## Grouping and Counting in R with dplyr

To perform group-by operations in R data frames, you can use `group_by()`

from the `dplyr`

package, followed by `summarise()`

to get counts for each group. The `group_by()`

function returns grouped data, and then you can apply `summarise()`

on this grouped data to compute the count.

Before using these functions, make sure to install `dplyr`

with install.packages(‘dplyr’) and load it using `library(dplyr)`

. In the examples, I will use the dplyr infix operator `%>%`

to chain functions, allowing `group_by()`

to be used as an input to `summarise()`

.

```
# Load dplyr
library(dplyr)
# Group by count using dplyr
agg_tbl <- df %>% group_by(department) %>%
summarise(total_count=n(),
.groups = 'drop')
agg_tbl
# Convert tibble to df
df2 <- agg_tbl %>% as.data.frame()
df2
```

The code snippet below demonstrates how to group data by the `department`

column and count the number of entries for each `department`

.

Please keep in mind that the `group_by()`

and `summarise()`

functions yield a tibble. If you would prefer a data frame, you can convert the tibble into a dataframe using `as.data.frame()`

.

## Counting Rows Based on Grouped Columns in R

This example groups the data by the `department`

and `state`

columns then find the count of occurrences for each unique `department`

and `state`

combination.

```
# Group by count of multiple columns
df2 <- df %>% group_by(department,state) %>%
summarise(total_count=n(),.groups = 'drop') %>%
as.data.frame()
df2
```

Yields below output.

## Grouping and Counting using R base aggregate()

R base provides an aggregate() function to perform the grouping on the dataframe, let’s use this to perform a groupby on the department column and get the count for each department.

R base package has the `aggregate()`

function, which allows you to group data in a data frame. You can apply this function on a given data frame to group the data based on a specific column and calculate the count for each unique value of that column.

```
# Group by count using R Base aggregate()
agg_df <- aggregate(df$state, by=list(df$department), FUN=length)
agg_df
```

Yields below output.

## Applying aggregate() to Multiple Columns

Alternatively, you can use the `aggregate()`

function to group the data according to multiple columns. Then apply the `length()`

function on grouped data to get the count for each unique combination of those columns.

```
# R Base aggregate() on multiple columns
agg_df <- aggregate(df$state, by=list(df$department,df$state), FUN=length)
agg_df
```

The above code groups rows according to the `department`

and `state`

columns, then use the `length()`

function to count the number of occurrences for each unique combination of `department`

and `state`

.

Yields below output.

## Conclusion

In this article, I have discussed how to perform group by count in R using the `group_by()`

function from the dplyr package and the base R `aggregate()`

function. When working with larger datasets, dplyr functions are generally more efficient.

## Related Articles

- R Group by Sum With Examples
- R Group by Mean With Examples
- R Summarise on Group By in Dplyr
- R lm() Function – Fitting Linear Models

## References

- https://www.rdocumentation.org/packages/dplyr/versions/0.7.8/topics/grouped_df
- https://www.w3schools.com/sql/sql_groupby.asp