By using the merge() function we can perform join on data frames in R programming. This merge() function supports all basic SQL Join Types like inner join, left or left outer join, right or right outer join, full outer join and cross join. If there are more than two data frames to be joined, then you can use reduce()
method available in tidyverse
library, this package also supports all other joins in R programming.
- R Join on Different Column Names
- R Join (Merge) on Multiple Columns
- R Join Multiple (three or more) Data Frames
To perform Anti and Semi joins use dplyr package functions anti_join()
and semi_join()
. In this tutorial, you will learn different ways and methods to join Data Frames using R examples. Please access Join on Multiple DataFrames in case you wanted to join more than two DataFrames.
- Join Syntax & Types
- Inner Join
- Full Outer Join
- Left Outer Join
- Right Outer Join
- Cross Join
- Left Anti Join
- Left Semi Join
- Self Join
R base uses merge()
join to perform most of the join by changing the values to parameters all
and by
.
Join | R Base | dplyr Package |
---|---|---|
Inner Join | all=FALSE | inner_join() |
Left or Left Outer Join | all.x=TRUE | left_join() |
Right or Right Outer Join | all.y=TRUE | right_join() |
Full Outer Join | all=TRUE | full_join() |
Cross Join | by=NULL ,all=FALSE | Not Available |
Left Anti Join | Not Available | anti_join() |
Left Semi Join | Not Available | semi_join() |
1. merge() Syntax
Below is the merge() syntax that is used to join data.frames.
# merge() Syntax
merge(x, y, …)
# S3 method for default
merge(x, y, …)
# S3 method for data.frame
merge(x, y, by = intersect(names(x), names(y)),
by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all,
sort = TRUE, suffixes = c(".x",".y"), no.dups = TRUE,
incomparables = NULL, …)
The rest of the tutorial explains Join Types using the above syntaxes
Before we jump into R Join examples, first, let’s create an emp
and dept
DataFrame’s. here, column emp_id
is unique on emp and dept_id
is unique on the dept dataset and dept_id from emp has a reference to dept_id on dept.
# Create Emp Data Frame
emp_df=data.frame(
emp_id=c(1,2,3,4,5,6),
name=c("Smith","Rose","Williams","Jones","Brown","Brown"),
superior_emp_id=c(-1,1,1,2,2,2),
dept_id=c(10,20,10,10,40,50)
)
# Create dept Data Frame
dept_df=data.frame(
dept_id=c(10,20,30,40),
dept_name=c("Finance","Marketing","Sales","IT")
)
df
dept_df
This print emp
and dept
DataFrame to the console.

2. Inner Join
In R, Inner join or natural join is the default join and it’s mostly used joining data frames, it is used to join data.frames on a specified column, and where column values don’t match the rows get dropped from both data.frames (emp
& dept
). Here by default, it uses all=FALSE
. This join is similar to a set intersection.
# R Inner Join
df2 <- merge(x = emp_df, y = dept_df, by = "dept_id")
print(df2)
When we apply Inner join on our datasets, it drops dept_id
50 from emp
and dept_id
30 from dept
datasets. Below is the result of the above Join expression.

Similarly, use inner_join() from dplyr package to get the same result.
# Import dplyr
library(dplyr)
# inner join using dplyr package
df2 = emp_df %>% inner_join(dept_df, by = "dept_id")
3. Left Outer Join
R left join returns all rows from the left data.frame regardless of the match found on the right data.frame when join expression doesn’t match, it assigns NA for that record and drops records from right where match not found.
Use all.x=TRUE to perform left outer join in R.
# R left outer join data.frames
df2 <- merge(x = emp_df, y = dept_df,
by = "dept_id", all.x=TRUE)
df2
From our dataset, dept_id
50 doesn’t have a record on dept
dataset hence, this record contains NA on dept
columns (dept_name & dept_id). and dept_id
30 from dept
dataset dropped from the results. Below is the result of the above Join expression.

Similarly, use left_join() from dplyr package to get the same result.
# Import dplyr
library(dplyr)
# left join using dplyr package
df2 = emp_df %>% left_join(dept_df, by = "dept_id")
4. Right Outer Join
Use all.y=TRUE
to perform right outer join, this is the opposite of left
join, here it returns all rows from the right data.frame regardless of math found on the left dataset, when join expression doesn’t match, it assigns NA for that record and drops records from left where match not found.
Use all.y=TRUE to perform right outer join
# R right outer join data.frames
df2 <- merge(x = emp_df, y = dept_df,
by = "dept_id", all.y=TRUE)
df2
From our example, the right dataset dept_id
30 doesn’t have it on the left dataset emp
hence, this record contains NA on emp
columns. and dept_id
50 dropped as a match not found on left. Below is the result of the above Join expression.

Similarly, use right_join() from dplyr package to get the same result.
# Import dplyr
library(dplyr)
# right join using dplyr package
df2 = emp_df %>% right_join(dept_df, by = "dept_id")
5. Full Outer Join
Use all=TRUE
to perform full outer join, it returns all rows from both DataFrames, where join expression doesn’t match it returns NA on respective record columns.
# R full outer join data.frames
df2 <- merge(x = emp_df, y = dept_df,
by = "dept_id", all=TRUE)
df2
From our emp
dataset’s dept_id
with value 50 doesn’t have a record on dept
hence dept columns have NA and dept_id
30 doesn’t have a record in emp
hence you see NA on emp columns. Below is the result of the above Join expression.

Similarly, use full_join() from dplyr package to get the same result.
# Import dplyr
library(dplyr)
# full join using dplyr package
df2 = emp_df %>% full_join(dept_df, by = "dept_id")
6. Cross Join
Use by=NULL to perform cross join. A Cross Join also known as cartesian join which performs every row of one dataframe is being joined with every other row of another dataframe.
# R Cross join data.frames
df2 <- merge(x = emp_df, y = dept_df,
by = NULL)
df2
7. Left Semi Join
R semi join is similar to inner
join difference being left semi
join returns all columns from the left DataFrame and ignores all columns from the right dataset. In other words, this join returns columns from the only left dataset for the records match in the right dataset on join expression, records not matched on join expression are ignored from both left and right datasets.
The same result can be achieved using select on the result of the inner join however, using this join would be efficient. To perform left semi join in R use the semi_join()
function from the dplyr
package.
# Import dplyr
library(dplyr)
# left semi join
df2 = emp_df %>% semi_join(dept_df, by = "dept_id")
df2
Below is the result of the above join expression.

8. Left Anti Join
R anti join does the exact opposite of the left semi join, left anti
join returns only columns from the left DataFrame for non-matched records. To perform left anti join in R use the anti_join()
function from the dplyr
package.
In other words, it selects all rows from the left data frame that are not present in the right data frame (similar to left df – right df).
# Import dplyr package
library(dplyr)
# left anti join data.frames
df2 = emp_df %>% anti_join(dept_df, by = "dept_id")
df2
Yields below output

9. Source Code | R Example
# Create Emp Data Frame
emp_df=data.frame(
emp_id=c(1,2,3,4,5,6),
name=c("Smith","Rose","Williams","Jones","Brown","Brown"),
superior_emp_id=c(-1,1,1,2,2,2),
dept_id=c(10,20,10,10,40,50)
)
# Create dept Data Frame
dept_df=data.frame(
dept_id=c(10,20,30,40),
dept_name=c("Finance","Marketing","Sales","IT")
)
# inner join data.frames
df2 <- merge(x = emp_df, y = dept_df, by = "dept_id")
# left outer join data.frames
df2 <- merge(x = emp_df, y = dept_df,
by = "dept_id", all.x=TRUE)
# right outer join data.frames
df2 <- merge(x = emp_df, y = dept_df,
by = "dept_id", all.y=TRUE)
# full outer join data.frames
df2 <- merge(x = emp_df, y = dept_df,
by = "dept_id", all=TRUE)
# cross outer join data.frames
df2 <- merge(x = emp_df, y = dept_df,
by = NULL)
# Import dplyr package
library(dplyr)
# left semi join data.frames
df2 = emp_df %>% semi_join(dept_df, by = "dept_id")
df2
# left anti join data.frames
df2 = emp_df %>% anti_join(dept_df, by = "dept_id")
df2
The examples explained here are available at the GitHub project for reference.
Conclusion
In this tutorial, you have learned how to perform R join DataFrames using merge() function and dplyr package. Also, learned how to perform INNER
, LEFT OUTER
, RIGHT OUTER
, LEFT ANTI
, LEFT SEMI
, and CROSS
joins with examples.
Related Articles
- R Join on Different Column Names
- R Join (Merge) on Multiple Columns
- R Join Multiple Data Frames
- R Anti Join
- R Outer Join
- R Right Join
- R Left Join
- R Inner Join
- R Semi Join
References:
Happy Learning !!