Pandas merge() function is used to merge multiple Dataframes. We can use either pandas.merge()
or DataFrame.merge()
to merge multiple Dataframes. Merging multiple Dataframes is similar to SQL join and supports different types of join inner
, left
, right
, outer
, cross
.
In this article, we will learn how to merge multiple (three or more) Dataframes with examples.
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
}
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)
technologies2 = {
'Courses':["Spark","Java","Python","Go"],
'Discount':[2000,2300,1200,2000]
}
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)
technologies3 = {
'Instructor': ["Jerry", "Tom", "Smith", "John"],
'Courses': ["Spark", "Java", "Python", "Scala"]
}
index_labels3 = ['r1','r6','r3','r7']
df3 = pd.DataFrame(technologies3, index=index_labels3)
print(df1)
print(df2)
print(df3)
Yields below the output of three DataFrames.
Courses Fee Duration
r1 Spark 20000 30days
r2 PySpark 25000 40days
r3 Python 22000 35days
r4 pandas 30000 50days
Courses Discount
r1 Spark 2000
r6 Java 2300
r3 Python 1200
r5 Go 2000
Instructor Courses
r1 Jerry Spark
r6 Tom Java
r3 Smith Python
r7 John Scala
1. pandas Merge on Multiple DataFrames Example
Method pandas.merge() and DataFrame.merge() are used to merge two or multiple DataFrames. Both these methods work exactly the same and they also take a similar number of params. Merging DataFrames is nothing but joining DataFrames similar to Database join.
Similar to the Database join, merge() method also supports several join types like left
, right
, inner
, outer
and cross
. Use how
param to specify the join type.
By default merge() uses inner join on columns that are present on both DataFrames. You can also explicitly specify the columns you wanted to join on and join by row index
df4 = pd.merge(pd.merge(df1,df2,on='Courses'),df3,on='Courses')
print(df4)
Yields below output. You can use the same approach to merge more than three DataFrames.
Courses Fee Duration Discount Instructor
0 Spark 20000 30days 2000 Jerry
1 Python 22000 35days 1200 Smith
Alternatively, you can also use DataFrame.merge() to join multiple pandas DataFrames.
# By using DataFrame.merge()
df5 = df1.merge(df2,on='Courses').merge(df3,on='Courses')
print(df5)
Yields same output as above.
2.Join Multiple DataFrames Using Left Join
In the above example merge of three Dataframes is done on the “Courses ” column. If we don’t specify also the merge will be done on the “Courses” column, the default behavior(join on inner) because the only common column on three Dataframes is “Courses”.
Below is the example of merge() using left join
# -*- coding: utf-8 -*-
"""
author SparkByExamples.com
"""
#DataFrame.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)
import pandas as pd
technologies = {
'Courses': ["Spark", "PySpark", "Python", "pandas"],
'Fee': [20000, 25000, 22000, 30000],
'Duration': ['30days', '40days', '35days', '50days'],
}
index_labels = ['r1', 'r2', 'r3', 'r4']
df1 = pd.DataFrame(technologies, index=index_labels)
technologies2 = {
'Courses': ["Spark", "Java", "Python", "Go"],
'Discount': [2000, 2300, 1200, 2000]
}
index_labels2 = ['r1', 'r6', 'r3', 'r5']
df2 = pd.DataFrame(technologies2, index=index_labels2)
technologies3 = {
'Instructor': ["Jerry", "Tom", "Smith", "John"],
'Courses': ["Spark", "Java", "Python", "Scala"]
}
index_labels3 = ['r1','r6','r3','r7']
df3 = pd.DataFrame(technologies3, index=index_labels3)
print(df1)
print(df2)
print(df3)
#df4 = pd.merge(pd.merge(df1,df2,on='Courses'),df3,on='Courses')
#print(df4)
#df5 = df1.merge(df2,on='Courses').merge(df3,on='Courses')
#print(df5)
df6 = df1.merge(df2,how ='left').merge(df3,how ='left')
print(df6)
The above example gives the following output
Courses Fee Duration
r1 Spark 20000 30days
r2 PySpark 25000 40days
r3 Python 22000 35days
r4 pandas 30000 50days
Courses Discount
r1 Spark 2000
r6 Java 2300
r3 Python 1200
r5 Go 2000
Instructor Courses
r1 Jerry Spark
r6 Tom Java
r3 Smith Python
r7 John Scala
Courses Fee Duration Discount Instructor
0 Spark 20000 30days 2000.0 Jerry
1 PySpark 25000 40days NaN NaN
2 Python 22000 35days 1200.0 Smith
3 pandas 30000 50days NaN NaN
Conclusion
So far we have learned how to merge pandas multiple Dataframes using pd.merge() and df. merge(). Merging is a join operation that combines the columns from multiple DataFrames based on conditions specified in the ‘on’ or ‘how’ clause. Merge also supports joining on columns, index, and combination of both.
Happy Learning !!
Related Articles
- Pandas Drop Rows With NaN
- How to Drop Rows From Pandas DataFrame Examples
- Drop Single & Multiple Columns From Pandas DataFrame
- Drop N Rows From DataFrame
- Get the Row Count From Pandas DataFrame
- Change Column Data Type On Pandas DataFrame
- Pandas apply() Function to Single & Multiple Column(s)
- Pandas Merge DataFrames on Index
- Pandas Merge DataFrames on Multiple Columns
- Pandas Merge Two DataFrames