How to do an inner join on data frames in R? To perform inner join use either merge()
function, dplyr inner_join()
function, or use reduce()
from tidyverse. Using the dplyr function is the best approach as it runs faster than the R base approach. dplyr package provides several functions to join data frames in R.
In R, Inner join or natural join is the default join and it’s mostly used joining data frames, it is used to join data.frames on a specified single or multiple columns, and where column values don’t match the rows get dropped from both data.frames (emp
& dept
).
1. Quick Examples of Inner Join
Following are quick examples of performing inner join on data frames.
# Quick Examples
# Inner join
df2 <- merge(x=emp_df,y=dept_df,
by="dept_id")
# Inner join on multiple columns
df2 <- merge(x=emp_df,y=dept_df,
by=c("dept_id","dept_branch_id"))
# Inner join on different columns
df2 <- merge(x=emp_df,y=dept_df,
by.x=c("dept_id","dept_branch_id"),
by.y=c("dept_id","dept_branch_id"))
# Load dplyr package
library(dplyr)
# Using dplyr - inner join multiple columns
df2 <- emp_df %>% inner_join( dept_df,
by=c('dept_id','dept_branch_id'))
# Using dplyr - inner join on different columns
df2 <- emp_df %>% inner_join( dept_df,
by=c('dept_id'='dept_id',
'dept_branch_id'='dept_branch_id'))
# Load tidyverse package
library(tidyverse)
# Inner Join data.frames
list_df = list(emp_df,dept_df)
df2 <- list_df %>% reduce(inner_join, by='dept_id')
df2
Let’s create two Data Frames, in the below example dept_id
and dept_branch_id
columns exists on both emp_df
and dept_df
data frames.
# Create emp Data Frame
emp_df=data.frame(
emp_id=c(1,2,3,4,5,6),
name=c("Smith","Rose","Williams","Jones","Brown","Brown"),
superior_emp_id=c(-1,1,1,2,2,2),
dept_id=c(10,20,10,10,40,50),
dept_branch_id= c(101,102,101,101,104,105)
)
# Create dept Data Frame
dept_df=data.frame(
dept_id=c(10,20,30,40),
dept_name=c("Finance","Marketing","Sales","IT"),
dept_branch_id= c(101,102,103,104)
)
emp_df
dept_df
Yields below output.

2. Inner Join using merge()
R base provides a merge()
function that is used to perform an inner join on two, three or more data frames. This function takes x and y data frames as left and right respectively and finally specify the by param with the column name you wanted to join.
The following example performs an inner join on the column dept_id
on emp_df
and dept_df
column. Here by default, it uses all=FALSE
. This join is similar to a set intersection operation.
# inner join
df2 <- merge(x=emp_df,y=dept_df,
by="dept_id")
df2
Yields below output. if you have the same column names that are not used in the join condition, it suffixes the x and y to the columns on the result. In the below example check dept_branch_id
.

2.1 Inner Join on Multiple Columns
To perform an inner join on multiple columns with the same names on both data frames, use all the column names as a list to by
param. I have also created a dedicated article where I have explained how to perform join on multiple columns using several ways.
# Using merge with same column names
df2 <- merge(x=emp_df,y=dept_df,
by=c("dept_id","dept_branch_id"))
df2
Yields below output.

2.2 Inner Join Different Column Names
Sometimes you will have data frames with different column names and you wanted to perform an inner join on these columns, to do so specify the column names from both data frames with params by.x
and by.y
.
# R join multiple columns
df2 <- merge(x=emp_df,y=dept_df,
by.x=c("dept_id","dept_branch_id"),
by.y=c("dept_id","dept_branch_id"))
df2
Since our data frame has the same column names, it results in the same output as above, I have created another article where I have explained how to perform join on different column names.
3. Using dplyr to Perform Inner Join in R
Using the inner_join() function from the dplyr package is the best approach to performing the inner join on two data frames. In order to use dplyr, you have to install it first using install.packages(‘dplyr’) and load it using library(dplyr)
.
All functions in dplyr package take data.frame
as a first argument. When we use dplyr
package, we mostly use the infix operator %>%
from magrittr
, it passes the left-hand side of the operator to the first argument of the right-hand side of the operator. For example, x %>% f(y)
converted into f(x, y)
so the result from the left-hand side is then “piped” into the right-hand side.
3.1 Multiple Columns with dplyr package
# Load dplyr package
library(dplyr)
# Using dplyr - inner join multiple columns
df2 <- emp_df %>% inner_join( dept_df,
by=c('dept_id','dept_branch_id'))
df2
Yields below output.

3.2 Using Different Column Names with dplyr package
# Load dplyr package
library(dplyr)
# join on multiple columns
df2 <- emp_df %>% inner_join( dept_df,
by=c('dept_id'='dept_id',
'dept_branch_id'='dept_branch_id'))df2
Yields the same output as above.
4. Using tidyverse Package
By using reduce()
function from tidyverse package you can perform join on multiple data frames, to perform inner join use inner_join
keyword.
# Load tidyverse package
library(tidyverse)
# Inner Join data.frames
list_df = list(emp_df,dept_df)
df2 <- list_df %>% reduce(inner_join, by='dept_id')
df2
5. Conclusion
In this article, you have learned how to perform an inner join on two data frame using the R base merge() function, inner_join() functions from the dplyr package, and reduce() from the tidyverse package. Using dplyr approach is the best to use when you are joining on larger datasets as it performs efficiently over the R base.