• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:14 mins read
You are currently viewing Pandas Merge Two DataFrames

Pandas support several methods to merge two DataFrames similar to SQL joins to combine columns. In this article, I will explain how to join two DataFrames using merge(), join(), and concat() methods. Each of these methods provides different ways to merge DataFrames. merge() is the most used approach to join two DataFrames by index and by columns. join() is primarily used to combine on index and concat() is used to append DataFrame rows but it can also be used to join.

Advertisements

Related: In Pandas, you can also merge multiple DataFrames.

1. Quick Examples of Pandas Merging Two DataFrames

Below are some quick examples of pandas merging two DataFrames.


# Below are the examples of merging two DataFrames

# Example 1: Pandas Merge two DataFrames using join()
df3 = df1.join(df2, lsuffix="_left", rsuffix="_right")

# Example 2: Using pd.merge()
df3 = pd.merge(df1,df2)

# Example 3: Using DataFrame.merge() 
df3 = df1.merge(df2)

# Example 4: Merge DataFrames by Columns
df3 = pd.merge(df1,df2, on='Courses')

# Example 5: Merge DataFrames by different Columns
df3 = pd.merge(df1,df2, left_on='Courses', right_on='Courses')

# Example 6: Merge DataFrames on index
df3 = pd.merge(df1, df2, left_index=True, right_index=True)

# Example 7:  By using concat()
df3 = pd.concat([df1,df2],axis=1,join='inner')

First, let’s create a DataFrame that I can use to demonstrate with the examples.


# Create DataFrames
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
              }
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)

technologies2 = {
    'Courses':["Spark","Java","Python","Go"],
    'Discount':[2000,2300,1200,2000]
              }
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)

print("First DataFrame:\n", df1)
print("Second DataFrame:\n", df2)

Yields below output.

pandas merge two DataFrames

2. Pandas Merge Two DataFrames using join()

pd.df.join() method can be used to combine two DataFrames on row indices. This by default does the left join and provides a way to specify the different join types. It supports left, inner, right, and outer join types. It also supports different params, refer to pandas join() for syntax, usage, and more examples of the join() method.

By default, it uses the left join on the row index. This is unlike merge() where it does inner join on common columns.


# Pandas merge two DataFrames
df3 = df1.join(df2, lsuffix="_left", rsuffix="_right")
print("After merging the two DataFrames:\n", df3)

Yields below output.

pandas merge two DataFrames

3. Pandas Merge Two DataFrames using merge()

In this section, I will explain how to merge pandas DataFrames using the merge() method. This method is the most efficient way to merge DataFrames on columns. It also supports merging on the index but an efficient way would be to use join().

Using merge() you can do merging by columns, merging by index, merging on multiple columns, and different join types. By default, it merges on all common columns that exist on both DataFrames and performs an inner join.


# Pandas.merge()
df3 = pd.merge(df1,df2)
print("After merging the two DataFrames:\n", df3)

# DataFrame.merge()
df3 = df1.merge(df2)
print("After merging the two DataFrames:\n", df3)

Yields below output.


# Output:
After merging the two DataFrames
  Courses    Fee Duration  Discount
0   Spark  20000   30days      2000
1  Python  22000   35days      1200

4. Pandas Merge Two DataFrames Based on Columns

To merge two DataFrames based on column you can explicitly specify the column name you want to use for joining. To use column names use on param of the merge() method. This also takes a list of names when you want to merge on multiple columns.


# Merge DataFrames by Columns
df3=pd.merge(df1,df2, on='Courses')
print(df3)

In case, if you want to combine column names that are different on two pandas DataFrames.


# Merge DataFrames by different columns
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses')
print("After merging the DataFrames:\n", df3)

merge() also supports different params, refer to pandas merge() to learn syntax, usage with examples.

5. Pandas Merge Two DataFrames on Index

You can use pandas.merge() function to merge two DataFrames by matching their index. When merging two DataFrames on the index, the value of left_index and right_index parameters of merge() function should be True. By default, the pd.merge() is a column-wise inner join. Let’s see with an example.


# merge two DataFrames on Index
df3 = pd.merge(df1, df2, left_index=True, right_index=True)
print("After merging the DataFrames:"df3)

Yields below output.


# Output:
After merging the DataFrames:
    Courses_x    Fee Duration Courses_y  Discount
r1     Spark  20000   30days     Spark      2000
r3    Python  22000   35days    Python      1200

6. pandas concat() Two DataFrames

pandas concat() method is the least used to merge two DataFrames. It is mainly used to append DataFrames Rows. however, it can also be used to merge pandas DataFrames.


# By using concat()
df3=pd.concat([df1,df2],axis=1,join='inner')
print(df3)

Yields below output.


# Output:
  Courses    Fee Duration  Discount
0   Spark  20000   30days      2000
1  Python  22000   35days      1200

5. Complete Examples


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
              }
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)

technologies2 = {
    'Courses':["Spark","Java","Python","Go"],
    'Discount':[2000,2300,1200,2000]
              }
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)

print(df1)
print(df2)

# Pandas join two DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right")
print(df3)

# Pandas.merge()
df3=pd.merge(df1,df2)

# DataFrame.merge()
df3=df1.merge(df2)

# Merge DataFrames by Columns
df3=pd.merge(df1,df2, on='Courses')

# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses')
print(df3)

# By using concat()
df3=pd.concat([df1,df2],axis=1,join='inner')
print(df3)
What is the purpose of merging two DataFrames in Pandas?

Merging two DataFrames in Pandas is used to combine data from multiple sources into a single DataFrame based on common columns or indexes. It allows you to perform operations like joining, merging, and concatenating data, which is essential for data analysis and manipulation.

How can I merge two DataFrames in Pandas?

You can merge two DataFrames in Pandas using the merge() function. This function allows you to specify the DataFrames to merge, the columns or keys on which to merge, and the type of merge operation (e.g., inner, outer, left, right).

Can I merge DataFrames on multiple columns in Pandas?

Yes, you can merge DataFrames on multiple columns by passing a list of column names to the on parameter of the merge() function.

What is the difference between the merge() function and the concat() function in Pandas?

The merge() function in Pandas is used to combine DataFrames based on common columns or keys, while the concat() function is used to concatenate DataFrames along a particular axis (either rows or columns). Merging involves combining data with a shared key, while concatenation simply stacks data together.

Conclusion

In this article, you have learned to merge two DataFrames using join(), merge(), and concat() methods with explanations and examples.

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium