You are currently viewing R Join Multiple Data Frames

In this article, I will explain how to perform join or merge on multiple data frames in R (more than two data frames). You can use either the R base function or the reduce() function from the tidyverse package. Using the tidyverse function is the best approach as it runs faster than the R-based approach.

1. Complete Example of Join Multiple Data Frames

Following are quick examples of joining multiple data frames.


# Join all data frames in list
list_df = list(emp_df,dept_df,add_df)
Reduce(function(x, y) merge(x, y, all=FALSE), list_df)

# Load tidyverse package
library(tidyverse)

# Join multiple data.frames
list_df = list(emp_df,dept_df,add_df)
df2 <- list_df %>% reduce(inner_join, by='dept_id')

Let’s create an R Data Frame.


# Create Emp Data Frame
emp_df=data.frame(
  emp_id=c(1,2,3,4,5,6),
  name=c("Smith","Rose","Williams","Jones","Brown","Brown"),
  superior_emp_id=c(-1,1,1,2,2,2),
  dept_id=c(10,20,10,10,40,50)
)

# Create dept Data Frame
dept_df=data.frame(
  dept_id=c(10,20,30,40),
  dept_name=c("Finance","Marketing","Sales","IT")
)

# Create address table
add_df=data.frame(
  dept_id=c(10,20,30,40),
  city=c("Newark","Willmington","Los Angeles","Sandiago"),
  state=c("NY","DE","CA","CA")
)

emp_df
dept_df
add_df

Yields below output.

r join multiple data frames
Multiple Data Frames

2. Join Multiple R Data Frames – Use reduce() from tidyverse

To join more than two (multiple) R data frames use the reduce() function from tidyverse package. This function takes all the data frames as a list and joins the data frames based on the specified column.

tidyverse is a third-party library hence, to use tidyverse library, you need to first install it by using install.packages(‘tidyverse’). Once installation is completed, load the tidyverse library using library("tidyverse").

When you are performing join operation on large data frames, prefer to use this approach over the R base.

2.1 Syntax of reduce()

Following is the syntax of the reduce() function.


# Syntax of Join multiple dataframes
df_list = list(my_dataframe1,my_dataframe2,............) 
df_list %>% reduce(keyword, by='column')
  • keyword – Type of join you want to perform.
  • by – Column that exists on all columns and you want to join by.

2.2. Join Multiple Data Frames Example

To join multiple data frames using the reduce() function first, create all data frames you want to join and pass them into a list and use this as an argument to the reduce() function along with the join keyword and by column. In the below example, we are performing inner-join on dept_id column from all data frames.


# Join multiple data.frames
list_df = list(emp_df,dept_df,add_df)
df2 <- list_df %>% reduce(inner_join, by='dept_id')
df2

Yields below output.

r join  multiple dataframes

3. Join Multiple Data Frame using merge()

Though it is not straightforward to join multiple data frames in R using the merge() function still you can use it along with the reduce() function. To use this approach first create the data frame you want to use into a list and use this as an argument to the reduce() function along with merge().

Using the reduce() function from the tidyverse is the best approach to use as it runs fast and is the most efficient.


# Join all data frames in list
list_df = list(emp_df,dept_df,add_df)
Reduce(function(x, y) merge(x, y, all=FALSE), list_df)

Yields the same output as above. If you are trying with large datasets, you will see this approach is running slower than the above approach.

4. Conclusion

In this article, you have learned how to perform join on multiple data frames using the R base approach and the reduce() function from the tidyverse package. You can find the complete example in the Github project for reference.

Related Articles

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium