• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:8 mins read
You are currently viewing Pandas Merge Multiple DataFrames

Pandas merge() function is used to merge multiple Dataframes. We can use either pandas.merge() or DataFrame.merge() to merge multiple Dataframes. Merging multiple Dataframes is similar to SQL join and supports different types of join innerleftrightoutercross.

In this article, we will learn how to merge multiple (three or more) Dataframes with examples.


# Create DataFrames
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
              }
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)

technologies2 = {
    'Courses':["Spark","Java","Python","Go"],
    'Discount':[2000,2300,1200,2000]
              }
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)
technologies3 = {
    'Instructor': ["Jerry", "Tom", "Smith", "John"],
    'Courses': ["Spark", "Java", "Python", "Scala"]
}
index_labels3 = ['r1','r6','r3','r7']
df3 = pd.DataFrame(technologies3, index=index_labels3)

print("First DataFrame:\n", df1)
print("Second DataFrame:\n", df2)
print("Third DataFrame:\n", df3)

Yields below the output of three DataFrames.

Pandas Merge Multiple DataFrames

1. Pandas Merge on Multiple DataFrames Example

pandas.merge() and DataFrame.merge() are used to merge two DataFrames or multiple DataFrames. Both these methods work exactly the same and they also take a similar number of params. Merging DataFrames is nothing but joining DataFrames similar to Database join.

Similar to the Database join, the merge() method also supports several join types like leftrightinnerouter and cross. Use how param to specify the join type.

By default merge() uses inner join on columns that are present on both DataFrames. You can also explicitly specify the columns you want to join and join by row index.


# Pandas Merge on Multiple DataFrames Example
df4 = pd.merge(pd.merge(df1,df2,on='Courses'),df3,on='Courses')
print("Afetr merging the multiple DataFrames:\n", df4)

Yields below output. You can use the same approach to merge more than three DataFrames.

Pandas Merge Multiple DataFrames

Alternatively, you can also use DataFrame.merge() to join multiple pandas DataFrames.


# By using DataFrame.merge()
df5 = df1.merge(df2,on='Courses').merge(df3,on='Courses')
print("Afetr merging the multiple DataFrames:\n", df5)

Yields the same output as above.

2. Join Multiple DataFrames Using Left Join

In the above example, a merge of three Dataframes is done in the “Courses ” column. If we don’t specify also the merge will be done on the “Courses” column, the default behavior(join on inner) because the only common column on three Dataframes is “Courses”.

Below is the example of merge() using left join


# Merge multiple DataFrames using left join
df6 = df1.merge(df2,how ='left').merge(df3,how ='left')
print(df6)

# Output:
#    Courses    Fee Duration  Discount Instructor
# 0    Spark  20000   30days    2000.0      Jerry
# 1  PySpark  25000   40days       NaN        NaN
# 2   Python  22000   35days    1200.0      Smith
# 3   pandas  30000   50days       NaN        NaN

3. Quick Examples of Merging Multiple DataFrames


# Below are quick examples of merging multiple DataFrames

# Create DataFrame
import pandas as pd
technologies = {
    'Courses': ["Spark", "PySpark", "Python", "pandas"],
    'Fee': [20000, 25000, 22000, 30000],
    'Duration': ['30days', '40days', '35days', '50days'],
}
index_labels = ['r1', 'r2', 'r3', 'r4']
df1 = pd.DataFrame(technologies, index=index_labels)

technologies2 = {
    'Courses': ["Spark", "Java", "Python", "Go"],
    'Discount': [2000, 2300, 1200, 2000]
}
index_labels2 = ['r1', 'r6', 'r3', 'r5']
df2 = pd.DataFrame(technologies2, index=index_labels2)


technologies3 = {
    'Instructor': ["Jerry", "Tom", "Smith", "John"],
    'Courses': ["Spark", "Java", "Python", "Scala"]
}
index_labels3 = ['r1','r6','r3','r7']
df3 = pd.DataFrame(technologies3, index=index_labels3)

# Pandas Merge on Multiple DataFrames using pd.merge()
df4 = pd.merge(pd.merge(df1,df2,on='Courses'),df3,on='Courses')
Print(df4)

# By using DataFrame.merge()
df5 = df1.merge(df2,on='Courses').merge(df3,on='Courses')
Print(df5)

# Merge multiple DataFrames using left join
df6 = df1.merge(df2,how ='left').merge(df3,how ='left')
print(df6)

Conclusion

So far, we have learned how to merge Pandas multiple Dataframes using pd.merge() and df.merge(). Merging is a join operation that combines the columns from multiple DataFrames based on conditions specified in the ‘on’ or ‘how’ clause. Merge also supports joining on columns, index, and a combination of both.

Happy Learning !!

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium