We will see how to perform the join operation on two or multiple DataFrames in R using merge()
function. There are different join types including inner join, left join, right join, and outer join. If there are more than two dataframes to be joined, then you can use reduce()
method available in tidyverse
library. It will also support the above four types of joins. In this article, we will see how to join two or multiple dataframes in R with examples.
1. Quick Examples
If you are in a hurry, below are quick examples of how to join two or multiple R dataframes.
#Perform inner join on my_dataframe1 and my_dataframe2 based on id column
print(merge(x= my_dataframe1,y= my_dataframe2, by = "id"))
#Perform left join on my_dataframe1 and my_dataframe2 based on id column
print(merge( x=my_dataframe1, y=my_dataframe2, by = "id",all.x = TRUE))
#Perform right join on my_dataframe1 and my_dataframe2 based on id column
print(merge( x=my_dataframe1, y=my_dataframe2, by = "id",all.y = TRUE))
#Perform outer join on my_dataframe1 and my_dataframe2 based on id column
print(merge( x=my_dataframe1, y=my_dataframe2, by = "id",all = TRUE))
#Perform cross join on my_dataframe1 and my_dataframe2 based on id column
print(merge( x=my_dataframe1, y=my_dataframe2, by = NULL))
#Load the tidyverse - package
library(tidyverse)
#Join three dataframes based on id column by performing inner join
print(list(my_dataframe1,my_dataframe2,my_dataframe3) %>% reduce(inner_join, by='id'))
#Join three dataframes based on id column by performing left join
print(list(my_dataframe1,my_dataframe2,my_dataframe3) %>% reduce(left_join, by='id'))
#Join three dataframes based on id column by performing right join
print(list(my_dataframe1,my_dataframe2,my_dataframe3) %>% reduce(right_join, by='id'))
#Join three dataframes based on id column by performing outer join
print(list(my_dataframe1,my_dataframe2,my_dataframe3) %>% reduce(full_join, by='id'))
Let’s create three dataframes that are required to explain the different join types.
#create dataframe with 3 rows and 2 columns
my_dataframe1=data.frame(id=c(2,1,3),gender=c('m','m','f'))
#display dataframe
print(my_dataframe1)
#Create dataframe with 3 rows and 2 columns
my_dataframe2=data.frame(id=c(4,3,1),name=c('shyam','sravani','vamsi'))
#Display dataframe
print(my_dataframe2)
#Create dataframe with 3 rows and 2 columns
my_dataframe3=data.frame(id=c(4,3,2),marks=c(79,80,97))
#Display dataframe
print(my_dataframe3)
Data in three dataframes:
# Output
# Dataframe 1
id gender
1 2 m
2 1 m
3 3 f
# Dataframe 2
id name
1 4 shyam
2 3 sravani
3 1 vamsi
# Dataframe 3
id marks
1 4 79
2 3 80
3 2 97
So, there are 2 columns in each dataframe, and the id
column is common in all three dataframes. Now, we will see different types of joins that are performed on the R dataframes based on the id
column.
2. Join Two R DataFrames
merge() in R is used to Join two dataframes and perform different kinds of joins. Let’s see them one by one.
2.1. Inner Join
Inner Join is also known as Natural Join used to join two dataframes. It will join only the matched rows from both the dataframes based on the column specified.
Syntax:
#Syntax for merge()
merge(x= my_dataframe1,y= my_dataframe2, by = "column")
where,
my_dataframe1
is the first dataframe specified by x.my_dataframe2
is the second dataframe specified by y.- the
column
is the name of the column such that both the dataframes are joined based on this column specified using by.
Example:
Perform Inner/Natural Join on the first two dataframes.
#Perform inner join on my_dataframe1 and my_dataframe2 based on id column
print(merge(x= my_dataframe1,y= my_dataframe2, by = "id"))
Output:
# Output
id gender name
1 1 m vamsi
2 3 f sravani
After performing Join, we can see that there are only two ids 1 and 3 that are common in the first two dataframes. So Inner join returned the rows related to id’s 1 and 3.
2.2. Left Join
Left Join is also known as Left Outer Join used to join two R dataframes. It will return all rows from the first dataframe and only matched rows from the second dataframe with respect to the first dataframe. It will return <NA>
for unmatched values in the second dataframe.
Syntax:
# Syntax
merge(x= my_dataframe1,y= my_dataframe2, by = "column",all.x = TRUE)
where,
my_dataframe1
is the first dataframe specified by x.my_dataframe2
is the second dataframe specified by y.- the
column
is the name of the column such that both the dataframes are joined based on this column specified using by. all.x
specifies the first dataframe – my_dataframe1 which will take boolean value TRUE to return all rows from my_dataframe1.
Example:
Perform Left Join on the first two dataframes.
#perform left join on my_dataframe1 and my_dataframe2 based on id column
print(merge( x=my_dataframe1, y=my_dataframe2, by = "id",all.x = TRUE))
Output:
# Output
id gender name
1 1 m vamsi
2 2 m <NA>
3 3 f sravani
After performing Left Join, we can see that id-2 is not there in the second dataframe. hence <NA> is replaced under the second dataframe name column.
2.3. Right Join
Right Join is also known as Right Outer Join used to join two dataframes. It will return all rows from the second dataframe and only matched rows from the first dataframe with respect to the second dataframe. It will return <NA> for unmatched values in the first dataframe.
Syntax:
# Syntax
merge(x= my_dataframe1,y= my_dataframe2, by = "column",all.y = TRUE)
where,
my_dataframe1
is the first dataframe specified by x.my_dataframe2
is the second dataframe specified by y.- the column is the name of the column such that both the dataframes are joined based on this column specified using by.
all.y
specifies the second dataframe –my_dataframe2
which will take boolean valueTRUE
to return all rows frommy_dataframe2
.
Example:
Perform Right Join on the first two dataframes.
#perform right join on my_dataframe1 and my_dataframe2 based on id column
print(merge( x=my_dataframe1, y=my_dataframe2, by = "id",all.y = TRUE))
Output:
# Output
id gender name
1 1 m vamsi
2 3 f sravani
3 4 <NA> shyam
After performing Right Join, we can see that id-4 is not there in the first dataframe. Hence <NA> is replaced under the first dataframe gender column.
2.4. Outer Join
Outer Join is also known as Full Join used to join two dataframes. It will return all matched and unmatched rows from both the dataframes. It will return <NA> for unmatched values in both the dataframes.
Syntax:
# Syntax
merge(x= my_dataframe1,y= my_dataframe2, by = "column",all = TRUE)
where,
my_dataframe1
is the first dataframe specified by x.my_dataframe2
is the second dataframe specified by y.- the
column
is the name of the column such that both the dataframes are joined based on this column specified using by. all
specifies two dataframes that will take boolean valueTRUE
to return all rows frommy_dataframe1
andmy_dataframe2
.
Example:
Perform Outer Join on the first two dataframes.
#perform outer join on my_dataframe1 and my_dataframe2 based on id column
print(merge( x=my_dataframe1, y=my_dataframe2, by = "id",all = TRUE))
Output:
# Output
id gender name
1 1 m vamsi
2 2 m <NA>
3 3 f sravani
4 4 <NA> shyam
After performing Outer Join, we can see that all rows from both the dataframes were returned.
2.5. Cross Join
Cross Join is used to join two R dataframes. It will perform join in each row of the first dataframe with all rows in the second dataframe.
Syntax:
# Syntax
merge(x= my_dataframe1,y= my_dataframe2, by = NULL)
where,
my_dataframe1
is the first dataframe specified by x.my_dataframe2
is the second dataframe specified by y.- by parameter takes
NULL
Example:
Perform Cross Join on the first two dataframes.
#perform cross join on my_dataframe1 and my_dataframe2 based on id column
print(merge( x=my_dataframe1, y=my_dataframe2, by = NULL))
Output:
# Output
id.x gender id.y name
1 2 m 4 shyam
2 1 m 4 shyam
3 3 f 4 shyam
4 2 m 3 sravani
5 1 m 3 sravani
6 3 f 3 sravani
7 2 m 1 vamsi
8 1 m 1 vamsi
9 3 f 1 vamsi
After performing Cross Join, we can see that each row in the first dataframe is joined with all rows in the second dataframe.
3. Join Multiple R DataFrames
To join more than two (multiple) R dataframes, then reduce()
is used. It is available in the tidyverse
package which will convert all the dataframes to a list and join the dataframes based on the column. It is similar to the above joins.
tidyverse
is a third-party library hence, in order to use tidyverse
library, you need to first install it by using install.packages('tidyverse')
. Once installation completes, load the tidyverse
library in order to use this reduce()
method. To load a library in R use library("tidyverse")
.
Syntax:
# Syntax
list(my_dataframe1,my_dataframe2,............) %>% reduce(keyword, by='column')
reduce()
takes two parameters.
keyword
– It is the join type keyword that specifies the type of joincolumn
– Column name in which dataframes are joined based on this column.
Use below values for keyword (join types).
- To perform an inner join – the
inner_join
keyword is used. - To perform left join – the
left_join
keyword is used. - To perform right join – the
right_join
keyword is used. - To perform an outer join – the
full_join
keyword is used.
Example:
In this example, we will perform all types of joins on three dataframes such that dataframes are joined based on the id column.
#load the tidyverse - package
library(tidyverse)
#Join three dataframes based on id column by performing inner join
print(list(my_dataframe1,my_dataframe2,my_dataframe3) %>% reduce(inner_join, by='id'))
#Join three dataframes based on id column by performing left join
print(list(my_dataframe1,my_dataframe2,my_dataframe3) %>% reduce(left_join, by='id'))
#Join three dataframes based on id column by performing right join
print(list(my_dataframe1,my_dataframe2,my_dataframe3) %>% reduce(right_join, by='id'))
#Join three dataframes based on id column by performing outer join
print(list(my_dataframe1,my_dataframe2,my_dataframe3) %%>% reduce(full_join, by='id'))
Output:
# Output
# Inner Join
id gender name marks
1 3 f sravani 80
# Left Join
id gender name marks
1 2 m <NA> 97
2 1 m vamsi NA
3 3 f sravani 80
# Right join
id gender name marks
1 3 f sravani 80
2 4 <NA> shyam 79
3 2 <NA> <NA> 97
# Full Outer Join
id gender name marks
1 2 m <NA> 97
2 1 m vamsi NA
3 3 f sravani 80
4 4 <NA> shyam 79
4. Conclusion
In this article, you have learned how to perform different types of join on two or multiple R dataframes. If there are two dataframes, then merge()
is used, otherwise, you can go with reduce()
method.
Related Articles
- How to rename column in R DataFrame
- How to Read Multiple CSV Files in R
- R if else Multiple Conditions
- R Group by Multiple Columns or Variables
- R Join (Merge) on Multiple Columns
- R Join Multiple Data Frames
- R Write Multiple Lines to Text File