How to do anti join or left anti join on data frames in R? To perform anti join use either dplyr anti_join()
function, or use reduce()
from tidyverse. dplyr package provides several functions to join data frames in R.
R Anti join does the exact opposite of the semi-join, anti join returns only columns from the left Data Frame for non-matched records, in other words, it selects all rows from the left data frame that are not present in the right data frame (similar to left df – right df).
1. Quick Examples of Anti Join
Following are quick examples of performing the anti join on data frames.
# Quick Examples
# Load dplyr package
library(dplyr)
# Using dplyr - anti join multiple columns
df2 <- emp_df %>% anti_join( dept_df,
by=c('dept_id','dept_branch_id'))
# Using dplyr - anti join on different columns
df2 <- emp_df %>% anti_join( dept_df,
by=c('dept_id'='dept_id', 'dept_branch_id'='dept_branch_id'))
# Load tidyverse package
library(tidyverse)
# anti Join data.frames
list_df = list(emp_df,dept_df)
df2 <- list_df %>% reduce(anti_join, by='dept_id')
df2
Let’s create two Data Frames, in the below example dept_id
and dept_branch_id
columns exist on both emp_df
and dept_df
data frames.
# Create emp Data Frame
emp_df=data.frame(
emp_id=c(1,2,3,4,5,6),
name=c("Smith","Rose","Williams","Jones","Brown","Brown"),
superior_emp_id=c(-1,1,1,2,2,2),
dept_id=c(10,20,10,10,40,50),
dept_branch_id= c(101,102,101,101,104,105)
)
# Create dept Data Frame
dept_df=data.frame(
dept_id=c(10,20,30,40),
dept_name=c("Finance","Marketing","Sales","IT"),
dept_branch_id= c(101,102,103,104)
)
emp_df
dept_df
Yields below output.

2. Using dplyr anti_join() in R
Using the anti_join() function from the R dplyr package is the best approach to performing the anti join on two data frames. In order to use dplyr, you have to install it first using install.packages(‘dplyr’) and load it using library(dplyr)
.
All functions in dplyr package take data.frame
as a first argument. When we use dplyr
package, we mostly use the infix operator %>%
from magrittr
, it passes the left-hand side of the operator to the first argument of the right-hand side of the operator. For example, x %>% f(y)
converted into f(x, y)
so the result from the left-hand side is then “piped” into the right-hand side.
2.1 Anti Join on Multiple Columns
Left anti join or anti join selects all rows from the left data frame that are not present in the right data frame (similar to left df – right df). To perform an anti join on multiple columns with the same names on both R data frames, use all the column names as a list to by
param. I have also created a dedicated article where I have explained how to perform join on multiple columns using several ways.
# Load dplyr package
library(dplyr)
# Using dplyr - anti join multiple columns
df2 <- emp_df %>% anti_join( dept_df,
by=c('dept_id','dept_branch_id'))
df2
Yields below output. left anti
join returns only columns from the left DataFrame for non-matched records.

2.2 Anti Join on Different Column Names
Sometimes you will have data frames with different column names and you wanted to perform an outer join on these columns, to do so specify the join conditions using by
param.
# Load dplyr package
library(dplyr)
# Join on different column names
df2 <- emp_df %>% anti_join( dept_df,
by=c('dept_id'='dept_id',
'dept_branch_id'='dept_branch_id'))
df2
Yields the same output as above.
3. Using tidyverse Package
By using reduce()
function from tidyverse package you can perform join on multiple data frames, to perform anti join use anti_join
keyword in R. If you wanted to do anti-join on multiple data frames, pass all data frames as a list to reduce() function.
# Load tidyverse package
library(tidyverse)
# Anti Join data.frames
list_df = list(emp_df,dept_df)
df2 <- list_df %>% reduce(anti_join, by='dept_id')
df2
4. Conclusion
In this article, you have learned how to perform an anti join on two data frames using anti_join() functions from the R dplyr package, and reduce() from the tidyverse package.