Pandas Join Two DataFrames

  • Post author:
  • Post category:Pandas
  • Post last modified:February 19, 2024
  • Reading time:17 mins read

Pandas support several methods to join two DataFrames similar to SQL joins to combine columns. In this article, I will explain how to join two DataFrames using merge(), join(), and concat() methods. Each of these methods provides different ways to join DataFrames. merge() is the most used approach to join two DataFrames by columns and index. join() is primarily used to combine on index and concat() is used to append DataFrame rows but it can also be used to join.

Key Points –

  • Use the pd.merge() function in Pandas to join two DataFrames based on one or more common columns.
  • Specify the columns to join on using the on parameter, or use left_on and right_on parameters if the column names are different in the two DataFrames.
  • Control the type of join (inner, outer, left, or right) using the how parameter.
  • Choose the appropriate join type (e.g., inner, outer, left, right) based on the desired outcome and the nature of the data being merged.
  • Handle duplicate column names resulting from the join using the suffixes parameter to distinguish them.

Quick Examples of Pandas Join Two DataFrames

Below are some quick examples of pandas joining two DataFrames.


# Below are the quick examples

# Pandas join two DataFrames by index
df3=df1.join(df2, lsuffix="_left", rsuffix="_right")

# Pandas.merge() by column
df3=pd.merge(df1,df2)

# DataFrame.merge() by column
df3=df1.merge(df2)

# Merge DataFrames by column
df3=pd.merge(df1,df2, on='Courses')

# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses')

# By using concat()
df3=pd.concat([df1,df2],axis=1,join='inner')

First, let’s create a DataFrame that I can use to demonstrate with examples.


# Create DataFrames
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
              }
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)

technologies2 = {
    'Courses':["Spark","Java","Python","Go"],
    'Discount':[2000,2300,1200,2000]
              }
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)

print(df1)
print(df2)

Yields below output

pandas join two DataFrames

Pandas join() Two DataFrames

Panads.DataFrame.join() method can be used to combine two DataFrames on row indices. This by default does the left join and provides a way to specify the different join types. It supports left, inner, right, and outer join types. It also supports different params, refer to pandas join() for syntax, usage, and more examples.

By default, it uses left join on the row index. This is unlike merge() where it does inner join on common columns.


# Pandas join two DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right")
print(df3)

Yields below output.


# Output:
   Courses_left    Fee Duration Courses_right  Discount
r1        Spark  20000   30days         Spark    2000.0
r2      PySpark  25000   40days           NaN       NaN
r3       Python  22000   35days        Python    1200.0
r4       pandas  30000   50days           NaN       NaN

Pandas merge() Two DataFrames

In this section, I will explain the usage of pandas DataFrames using merge() method. This method is the most efficient way to join DataFrames on columns. It also supports joining on the index but an efficient way would be to use join().

Using merge() you can merge by columns, by index, merging on multiple columns, and different join types. By default, it joins on all common columns that exist on both DataFrames and performs an inner join.


# Pandas.merge()
df3=pd.merge(df1,df2)

# DataFrame.merge()
df3=df1.merge(df2)

Yields below output.


# Output:
  Courses    Fee Duration  Discount
0   Spark  20000   30days      2000
1  Python  22000   35days      1200

You can also specify the column names explicitly.


# Merge DataFrames by Columns
df3=pd.merge(df1,df2, on='Courses')

In case if you wanted to combine column names that are different on two pandas DataFrames.


# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses')
print(df3)

merge() also supports different params, refer to pandas merge() to learn syntax, usage with examples.

pandas concat() Two DataFrames

pandas concat() method is the least used to join two DataFrames. It is mainly used to append DataFrames Rows. however, it can also be used to join pandas DataFrames.


# By using concat()
df3=pd.concat([df1,df2],axis=1,join='inner')
print(df3)

Yields below output.


# Output:
  Courses    Fee Duration  Discount
0   Spark  20000   30days      2000
1  Python  22000   35days      1200

Complete Examples of Pandas Join two DataFrames


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
              }
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)

technologies2 = {
    'Courses':["Spark","Java","Python","Go"],
    'Discount':[2000,2300,1200,2000]
              }
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)

print(df1)
print(df2)

# Pandas join two DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right")
print(df3)

# Pandas.merge()
df3=pd.merge(df1,df2)

# DataFrame.merge()
df3=df1.merge(df2)

# Merge DataFrames by Columns
df3=pd.merge(df1,df2, on='Courses')

# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses')
print(df3)

# By using concat()
df3=pd.concat([df1,df2],axis=1,join='inner')
print(df3)

Frequently Asked Questions on Join Pandas Two DataFrames

How can I join two DataFrames?

You can use pd.merge(), df.join(), or df.concat() functions in Pandas to join two DataFrames. The choice of method depends upon the type of join you want to perform.

What are the common types of joins?

The common types of joins are:
Inner Join: It returns only the matching rows from both DataFrames.
Left Join: It returns all rows from the left DataFrame and matching rows from the right DataFrame.
Right Join: It returns all rows from the right DataFrame and matching rows from the left DataFrame.
Outer Join: It returns all rows from both DataFrames, filling in missing values with NaN.

How do I perform an inner join in Pandas?

To perform an inner join use the pd.merge() method, specifying the two DataFrames and the common column(s) on which to join.

What is the difference between merging and joining in Pandas?

In Pandas, merging and joining essentially perform the same operation of combining two DataFrames based on common columns. However, “merge” is the preferred term and function (pd.merge()), while “join” typically refers to specific types of merges, such as SQL-style joins (df.join()).

How can I merge DataFrames based on multiple columns?

You can merge DataFrames based on multiple columns by passing a list of column names to the on parameter in the merge() function. Alternatively, you can use the left_on and right_on parameters for DataFrames with different column names.

How do I merge two DataFrames with different column names?

If the DataFrames have different column names but represent the same data, you can use the left_on right_on parameters in the merge() function to specify the columns to join on explicitly.

Conclusion

In this article, you have learned joining two DataFrames using join(), merge(), and concat() methods with explanations and examples.

References

Naveen (NNK)

Naveen (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ @ LinkedIn

Leave a Reply