You are currently viewing How to do Right Join in R?

How to perform the right join on data frames in R? To perform right join or right outer join you can use the R base merge() function, right_join() from dplyr, or reduce() from the tidyverse package. Among these options, the right_join() function of the dplyr package is recommended because it is more efficient compared to the base R merge() function. The dplyr package offers various functions for joining data frames in R.

Advertisements

In R, a right join is a type of data merging or joining operation that includes all rows from the right (second) data frame and only matching rows from the left (first) data frame. If there is no match, it fills in with NA values for the left data frame columns.

Key Points-

  • A right join, also known as a right outer join, includes all rows from the second (right) data frame and only matching rows from the first (left) data frame. Unmatched rows in the left data frame are filled with NA.
  • To perform a right join using base R, the merge() function can be used with all.y = TRUE. This method allows for joining on single or multiple columns.
  • The right_join() function from the dplyr package is typically recommended for a right join in R. It’s efficient and uses a pipe syntax for a cleaner workflow.
  • If you need to join multiple data frames, you can use reduce() from the tidyverse package. This function allows chaining multiple right joins.
  • To perform a right join on multiple columns, you can specify the vector of column names with the by parameter for both merge() and right_join().
  • If the data frames have different column names, you can use the by.x and by.y parameters for merge() or specify column mapping in right_join().
  • If any unmatched rows are in the right data frame those will be handled in the resulting data frame, with NA values for the columns from the left data frame.

Quick Examples of Performing Right Join Operations on Data Frame

Below are the quick examples of right join on operations.


# Quick Examples

# Right join
df2 <- merge(x=emp_df,y=dept_df, 
             by="dept_id", all.y=TRUE)

# Right join on multiple columns
df2 <- merge(x=emp_df,y=dept_df, 
        by=c("dept_id","dept_branch_id"), all.y=TRUE)

# Right join on different columns
df2 <- merge(x=emp_df,y=dept_df, 
      by.x=c("dept_id","dept_branch_id"), 
      by.y=c("dept_id","dept_branch_id"),
      all.y=TRUE)

# Load dplyr package
library(dplyr)

# Using dplyr - right join multiple columns
df2 <- emp_df %>% right_join( dept_df, 
           by=c('dept_id','dept_branch_id'))

# Using dplyr - right join on different columns
df2 <- emp_df %>% right_join( dept_df, 
        by=c('dept_id'='dept_id', 
             'dept_branch_id'='dept_branch_id'))

# Load tidyverse package
library(tidyverse)

# Right outer Join  data.frames
list_df = list(emp_df,dept_df)
df2 <- list_df %>% reduce(right_join, by='dept_id')
df2

Create two data frames named emp_df and dept_df having common columns named dept_id and dept_branch_id.


# Create emp Data Frame
emp_df=data.frame(
  emp_id=c(1,2,3,4,5,6),
  name=c("Smith","Rose","Williams","Jones","Brown","Brown"),
  superior_emp_id=c(-1,1,1,2,2,2),
  dept_id=c(10,20,10,10,40,50),
  dept_branch_id= c(101,102,101,101,104,105)
)

# Create dept Data Frame
dept_df=data.frame(
  dept_id=c(10,20,30,40),
  dept_name=c("Finance","Marketing","Sales","IT"),
  dept_branch_id= c(101,102,103,104)
)

emp_df
dept_df

Yields below output.

r right join

Right Join using merge()

The R base merge() function allows you to perform a right join operation on two or more data frames. The function uses the x parameter to represent the left data frame and the y parameter for the right data frame. To specify the column(s) on which the data frames should be joined, you use the by parameter, and you set all.y = TRUE it to indicate that you want a right join.

When you perform a right join on two data frames with a common column, the merge() function will return a new data frame that includes all the rows from the right data frame, along with any matching rows from the left data frame. If there are rows in the right data frame that don’t have a corresponding match in the left data frame, those unmatched rows will appear in the resulting data frame, with NA values for the columns from the left data frame.

If there are common columns in data frames that don’t participate in a join operation, they will appear in the resulting data frame with the suffixes x and y respectively.


# Perform Right join on single common column
df2 <- merge(x=emp_df,y=dept_df, 
          by="dept_id", all.y=TRUE)
df2

Yields below output.

Right Join on Multiple Columns

Alternatively, you can use the R base merge() function to perform a right join operation on multiple common columns of two data frames. Here, you can specify the vector of common columns using the by parameter. It will return the new data frame containing all rows from the right data frame and matching rows from the left data frame. If nonmatching rows are present in the right data frame will appear in corresponding rows of the left data frame as NA values in the resulting data frame.

Related: You can refer to the dedicated article to perform join operations on multiple columns in several ways.


# Using merge with multiple column names
df2 <- merge(x=emp_df,y=dept_df, 
             by=c("dept_id","dept_branch_id"),
             all.y=TRUE)
             
df2

Yields below output.

r right join multiple columns

Right Join on Different Column Names

If you have data frames with varying column names and you need to perform a right join based on specific columns, you can use the by.x and by.y parameters to represent which columns to match from each data frame.


# R right join multiple columns
df2 <- merge(x=emp_df,y=dept_df, 
                by.x=c("dept_id","dept_branch_id"), 
                by.y=c("dept_id","dept_branch_id"),
                all.y=TRUE)
df2

Yields the same output as above.

Use dplyr to Perform Right Join in R

The right_join() function from the dplyr package is the most efficient way to perform a right join between two data frames. To use dplyr, start by installing it with install.packages(‘dplyr’), then load it with library(dplyr).

In the dplyr package, all functions take an data.frame as their first argument. When using a dplyr package it is common to use the %>% infix operator from magrittr. This operator pipes the output from the left side into the first argument of the function on the right side. For instance, x %>% f(y) is equivalent to f(x, y), so that the left side’s output becomes the first input to the function on the right side.

Multiple Columns with dplyr package

The right_join() function of the dplyr package allows you to perform a right join on two data frames based on a common set of columns. This function takes two data frames and a by parameter which specifies the columns to use for joining.

A right join returns all rows from the right data frame and any matching rows from the left data frame. If there are rows in the right data frame without corresponding matches in the left data frame, these will still be included, but the corresponding columns from the left data frame will contain NA (not available) values.


# Load dplyr package
library(dplyr)

# Using dplyr - right join multiple columns
df2 <- emp_df %>% right_join( dept_df, 
            by=c('dept_id','dept_branch_id'))
df2

Yields below output.

r right join different columns

3.2 Using Different Column Names with dplyr package

You can also use the right_join() function to perform the right join operation on two data frames based on different column names.


# Load dplyr package
library(dplyr)

# join on multiple columns
df2 <- emp_df %>% right_join( dept_df, 
        by=c('dept_id'='dept_id', 
             'dept_branch_id'='dept_branch_id'))
df2

Yields the same output as above.

Using tidyverse Package

Finally, you can use the reduce() function from the tidyverse package to join multiple data frames based on single/multiple common column names. To implement this function we have to install the tidyverse package. To install the tidyverse package in R, you can use the install.packages() function. Once installed, you can load the tidyverse with the library() function .

Let’s apply this function to given two data frames based on a single common column to get the combined data frame.


# Load tidyverse package
library(tidyverse)

# Right Join  data.frames
list_df = list(emp_df,dept_df)
df2 <- list_df %>% reduce(right_join, by='dept_id')
df2

Yields below output.

5. Conclusion

In this article, I have explained how to implement a right join operation on two data frames in R using the base merge() function, the right_join() function from the dplyr package, and the reduce() function from the tidyverse package. The dplyr method is recommended for larger datasets, as it is more efficient compared to the base R approach.

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium