In this article, I will explain how to perform join or merge on multiple data frames in R (more than two data frames). You can use either the R base function or
reduce() function from the tidyverse package. Using the tidyverse function is the best approach as it runs faster than the R base approach.
1. Complete Example of Join Multiple Data Frames
Following are quick examples of joining multiple data frames.
# Join all data frames in list list_df = list(emp_df,dept_df,add_df) Reduce(function(x, y) merge(x, y, all=FALSE), list_df) # Load tidyverse package library(tidyverse) # Join multiple data.frames list_df = list(emp_df,dept_df,add_df) df2 <- list_df %>% reduce(inner_join, by='dept_id')
Let’s create an R Data Frame.
# Create Emp Data Frame emp_df=data.frame( emp_id=c(1,2,3,4,5,6), name=c("Smith","Rose","Williams","Jones","Brown","Brown"), superior_emp_id=c(-1,1,1,2,2,2), dept_id=c(10,20,10,10,40,50) ) # Create dept Data Frame dept_df=data.frame( dept_id=c(10,20,30,40), dept_name=c("Finance","Marketing","Sales","IT") ) # Create address table add_df=data.frame( dept_id=c(10,20,30,40), city=c("Newark","Willmington","Los Angeles","Sandiago"), state=c("NY","DE","CA","CA") ) emp_df dept_df add_df
Yields below output.
2. Join Multiple R Data Frames – Use reduce() from tidyverse
To join more than two (multiple) R data frames use the
reduce() function from
tidyverse package. This function takes all the data frames as a list and joins the data frames based on the specified column.
tidyverse is a third-party library hence, in order to use
tidyverse library, you need to first install it by using
install.packages('tidyverse'). Once installation completes, load the
tidyverse library using
When you are performing join on large data frames, prefer to use this approach over R base.
2.1 Syntax of reduce()
Following is the syntax of reduce() function.
# Syntax of Join multiple dataframes df_list = list(my_dataframe1,my_dataframe2,............) df_list %>% reduce(keyword, by='column')
keyword– Type of join you wanted to perform.
by– Column that exists on all columns and you wanted to join by.
2.2. Join Multiple Data Frames Example
To join multiple data frames using reduce() function first create the all data frames you wanted into a list and use this as an argument to reduce() function along with the join keyword and by column. In this below example, we are performing inner-join on dept_id column from all data frames.
# Join multiple data.frames list_df = list(emp_df,dept_df,add_df) df2 <- list_df %>% reduce(inner_join, by='dept_id') df2
Yields below output.
3. Join Multiple Data Frame using merge()
Though it is not straight-forward to join multiple data frames in R using merge() function still you can use it along with the reduce() function. In order to use this approach first create the data frame you wanted to use into list and use this as an argument to reduce() function along with merge().
Using reduce() function from the tidyverse is the best approach to use as it runs fast and is most efficient.
# Join all data frames in list list_df = list(emp_df,dept_df,add_df) Reduce(function(x, y) merge(x, y, all=FALSE), list_df)
Yields the same output as above. If you are trying with large datasets, you will see this approach is running slower than the above approach.
In this article, you have learned how to perform join on multiple data frames using R base approach and reduce() function from tidyverse package. You can find the complete example at Github project fro reference.