R Join Multiple Data Frames

Spread the love

In this article, I will explain how to perform join or merge on multiple data frames in R (more than two data frames). You can use either the R base function or reduce() function from the tidyverse package. Using the tidyverse function is the best approach as it runs faster than the R base approach.

1. Complete Example of Join Multiple Data Frames

Following are quick examples of joining multiple data frames.


# Join all data frames in list
list_df = list(emp_df,dept_df,add_df)
Reduce(function(x, y) merge(x, y, all=FALSE), list_df)

# Load tidyverse package
library(tidyverse)

# Join multiple data.frames
list_df = list(emp_df,dept_df,add_df)
df2 <- list_df %>% reduce(inner_join, by='dept_id')

Let’s create an R Data Frame.


# Create Emp Data Frame
emp_df=data.frame(
  emp_id=c(1,2,3,4,5,6),
  name=c("Smith","Rose","Williams","Jones","Brown","Brown"),
  superior_emp_id=c(-1,1,1,2,2,2),
  dept_id=c(10,20,10,10,40,50)
)

# Create dept Data Frame
dept_df=data.frame(
  dept_id=c(10,20,30,40),
  dept_name=c("Finance","Marketing","Sales","IT")
)

# Create address table
add_df=data.frame(
  dept_id=c(10,20,30,40),
  city=c("Newark","Willmington","Los Angeles","Sandiago"),
  state=c("NY","DE","CA","CA")
)

emp_df
dept_df
add_df

Yields below output.

r join multiple data frames
Multiple Data Frames

2. Join Multiple R Data Frames – Use reduce() from tidyverse

To join more than two (multiple) R data frames use the reduce() function from tidyverse package. This function takes all the data frames as a list and joins the data frames based on the specified column.

tidyverse is a third-party library hence, in order to use tidyverse library, you need to first install it by using install.packages('tidyverse'). Once installation completes, load the tidyverse library using library("tidyverse").

When you are performing join on large data frames, prefer to use this approach over R base.

2.1 Syntax of reduce()

Following is the syntax of reduce() function.


# Syntax of Join multiple dataframes
df_list = list(my_dataframe1,my_dataframe2,............) 
df_list %>% reduce(keyword, by='column')
  • keyword – Type of join you wanted to perform.
  • by – Column that exists on all columns and you wanted to join by.

2.2. Join Multiple Data Frames Example

To join multiple data frames using reduce() function first create the all data frames you wanted into a list and use this as an argument to reduce() function along with the join keyword and by column. In this below example, we are performing inner-join on dept_id column from all data frames.


# Join multiple data.frames
list_df = list(emp_df,dept_df,add_df)
df2 <- list_df %>% reduce(inner_join, by='dept_id')
df2

Yields below output.

r join  multiple dataframes

3. Join Multiple Data Frame using merge()

Though it is not straight-forward to join multiple data frames in R using merge() function still you can use it along with the reduce() function. In order to use this approach first create the data frame you wanted to use into list and use this as an argument to reduce() function along with merge().

Using reduce() function from the tidyverse is the best approach to use as it runs fast and is most efficient.


# Join all data frames in list
list_df = list(emp_df,dept_df,add_df)
Reduce(function(x, y) merge(x, y, all=FALSE), list_df)

Yields the same output as above. If you are trying with large datasets, you will see this approach is running slower than the above approach.

4. Conclusion

In this article, you have learned how to perform join on multiple data frames using R base approach and reduce() function from tidyverse package. You can find the complete example at Github project fro reference.

Related Articles

Naveen (NNK)

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing R Join Multiple Data Frames