Site icon Spark By {Examples}

How to do Right Join in R?

r right join

How to perform the right join on data frames in R? To perform right join or right outer join you can use the R base merge() function, right_join() from dplyr, or reduce() from the tidyverse package. Among these options, the right_join() function of the dplyr package is recommended because it is more efficient compared to the base R merge() function. The dplyr package offers various functions for joining data frames in R.

In R, a right join is a type of data merging or joining operation that includes all rows from the right (second) data frame and only matching rows from the left (first) data frame. If there is no match, it fills in with NA values for the left data frame columns.

Key Points-

Quick Examples of Performing Right Join Operations on Data Frame

Below are the quick examples of right join on operations.


# Quick Examples

# Right join
df2 <- merge(x=emp_df,y=dept_df, 
             by="dept_id", all.y=TRUE)

# Right join on multiple columns
df2 <- merge(x=emp_df,y=dept_df, 
        by=c("dept_id","dept_branch_id"), all.y=TRUE)

# Right join on different columns
df2 <- merge(x=emp_df,y=dept_df, 
      by.x=c("dept_id","dept_branch_id"), 
      by.y=c("dept_id","dept_branch_id"),
      all.y=TRUE)

# Load dplyr package
library(dplyr)

# Using dplyr - right join multiple columns
df2 <- emp_df %>% right_join( dept_df, 
           by=c('dept_id','dept_branch_id'))

# Using dplyr - right join on different columns
df2 <- emp_df %>% right_join( dept_df, 
        by=c('dept_id'='dept_id', 
             'dept_branch_id'='dept_branch_id'))

# Load tidyverse package
library(tidyverse)

# Right outer Join  data.frames
list_df = list(emp_df,dept_df)
df2 <- list_df %>% reduce(right_join, by='dept_id')
df2

Create two data frames named emp_df and dept_df having common columns named dept_id and dept_branch_id.


# Create emp Data Frame
emp_df=data.frame(
  emp_id=c(1,2,3,4,5,6),
  name=c("Smith","Rose","Williams","Jones","Brown","Brown"),
  superior_emp_id=c(-1,1,1,2,2,2),
  dept_id=c(10,20,10,10,40,50),
  dept_branch_id= c(101,102,101,101,104,105)
)

# Create dept Data Frame
dept_df=data.frame(
  dept_id=c(10,20,30,40),
  dept_name=c("Finance","Marketing","Sales","IT"),
  dept_branch_id= c(101,102,103,104)
)

emp_df
dept_df

Yields below output.

r right join

Right Join using merge()

The R base merge() function allows you to perform a right join operation on two or more data frames. The function uses the x parameter to represent the left data frame and the y parameter for the right data frame. To specify the column(s) on which the data frames should be joined, you use the by parameter, and you set all.y = TRUE it to indicate that you want a right join.

When you perform a right join on two data frames with a common column, the merge() function will return a new data frame that includes all the rows from the right data frame, along with any matching rows from the left data frame. If there are rows in the right data frame that don’t have a corresponding match in the left data frame, those unmatched rows will appear in the resulting data frame, with NA values for the columns from the left data frame.

If there are common columns in data frames that don’t participate in a join operation, they will appear in the resulting data frame with the suffixes x and y respectively.


# Perform Right join on single common column
df2 <- merge(x=emp_df,y=dept_df, 
          by="dept_id", all.y=TRUE)
df2

Yields below output.

Right Join on Multiple Columns

Alternatively, you can use the R base merge() function to perform a right join operation on multiple common columns of two data frames. Here, you can specify the vector of common columns using the by parameter. It will return the new data frame containing all rows from the right data frame and matching rows from the left data frame. If nonmatching rows are present in the right data frame will appear in corresponding rows of the left data frame as NA values in the resulting data frame.

Related: You can refer to the dedicated article to perform join operations on multiple columns in several ways.


# Using merge with multiple column names
df2 <- merge(x=emp_df,y=dept_df, 
             by=c("dept_id","dept_branch_id"),
             all.y=TRUE)
             
df2

Yields below output.

r right join multiple columns

Right Join on Different Column Names

If you have data frames with varying column names and you need to perform a right join based on specific columns, you can use the by.x and by.y parameters to represent which columns to match from each data frame.


# R right join multiple columns
df2 <- merge(x=emp_df,y=dept_df, 
                by.x=c("dept_id","dept_branch_id"), 
                by.y=c("dept_id","dept_branch_id"),
                all.y=TRUE)
df2

Yields the same output as above.

Use dplyr to Perform Right Join in R

The right_join() function from the dplyr package is the most efficient way to perform a right join between two data frames. To use dplyr, start by installing it with install.packages(‘dplyr’), then load it with library(dplyr).

In the dplyr package, all functions take an data.frame as their first argument. When using a dplyr package it is common to use the %>% infix operator from magrittr. This operator pipes the output from the left side into the first argument of the function on the right side. For instance, x %>% f(y) is equivalent to f(x, y), so that the left side’s output becomes the first input to the function on the right side.

Multiple Columns with dplyr package

The right_join() function of the dplyr package allows you to perform a right join on two data frames based on a common set of columns. This function takes two data frames and a by parameter which specifies the columns to use for joining.

A right join returns all rows from the right data frame and any matching rows from the left data frame. If there are rows in the right data frame without corresponding matches in the left data frame, these will still be included, but the corresponding columns from the left data frame will contain NA (not available) values.


# Load dplyr package
library(dplyr)

# Using dplyr - right join multiple columns
df2 <- emp_df %>% right_join( dept_df, 
            by=c('dept_id','dept_branch_id'))
df2

Yields below output.

r right join different columns

3.2 Using Different Column Names with dplyr package

You can also use the right_join() function to perform the right join operation on two data frames based on different column names.


# Load dplyr package
library(dplyr)

# join on multiple columns
df2 <- emp_df %>% right_join( dept_df, 
        by=c('dept_id'='dept_id', 
             'dept_branch_id'='dept_branch_id'))
df2

Yields the same output as above.

Using tidyverse Package

Finally, you can use the reduce() function from the tidyverse package to join multiple data frames based on single/multiple common column names. To implement this function we have to install the tidyverse package. To install the tidyverse package in R, you can use the install.packages() function. Once installed, you can load the tidyverse with the library() function .

Let’s apply this function to given two data frames based on a single common column to get the combined data frame.


# Load tidyverse package
library(tidyverse)

# Right Join  data.frames
list_df = list(emp_df,dept_df)
df2 <- list_df %>% reduce(right_join, by='dept_id')
df2

Yields below output.

5. Conclusion

In this article, I have explained how to implement a right join operation on two data frames in R using the base merge() function, the right_join() function from the dplyr package, and the reduce() function from the tidyverse package. The dplyr method is recommended for larger datasets, as it is more efficient compared to the base R approach.

Exit mobile version