• Post author:
  • Post category:Pandas
  • Post last modified:December 5, 2024
  • Reading time:13 mins read
You are currently viewing Pandas Join Two DataFrames

To join two DataFrames in pandas, you can use several methods depending on how you want to combine them. The most common methods are merge(), join(), and concatenation with concat().

Advertisements

In this article, I will explain how to join two DataFrames using merge(), join(), and concat() methods. Each of these methods provides different ways to join DataFrames. merge() is the most used approach to join two DataFrames by columns and index. join() is primarily used to combine on index and concat() is used to append DataFrame rows but it can also be used to join.

Key Points –

  • Use the pd.merge() function in Pandas to join two DataFrames based on one or more common columns.
  • Specify the columns to join on using the on parameter, or use left_on and right_on parameters if the column names are different in the two DataFrames.
  • Control the type of join (inner, outer, left, or right) using the how parameter.
  • Choose the appropriate join type (e.g., inner, outer, left, right) based on the desired outcome and the nature of the data being merged.
  • Handle duplicate column names resulting from the join using the suffixes parameter to distinguish them.

Quick Examples of Join Two DataFrames

Following are quick examples of pandas joining two DataFrames.


# Quick examples of join two dataframes

# Pandas join two DataFrames by index
df3=df1.join(df2, lsuffix="_left", rsuffix="_right")

# Pandas.merge() by column
df3=pd.merge(df1,df2)

# DataFrame.merge() by column
df3=df1.merge(df2)

# Merge DataFrames by column
df3=pd.merge(df1,df2, on='Courses')

# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses')

# By using concat()
df3=pd.concat([df1,df2],axis=1,join='inner')

Now, let’s create a pandas DataFrame.


# Create DataFrames
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
              }
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)

technologies2 = {
    'Courses':["Spark","Java","Python","Go"],
    'Discount':[2000,2300,1200,2000]
              }
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)

print(df1)
print(df2)

Yields below output

pandas join two DataFrames

Pandas join() Two DataFrames

The Pandas DataFrame join() method can be used to combine two DataFrames on row indices. By default, it performs a left join, offering options to specify various join types like left, inner, right, and outer joins. Additionally, the method supports diverse parameters. For further syntax, usage, and additional examples, consult the documentation on pandas join().

By default, join() uses a left join on the row index. This contrasts with merge(), which performs an inner join on common columns by default.


# Pandas join two DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right")
print(df3)

Yields below output.


# Output:
   Courses_left    Fee Duration Courses_right  Discount
r1        Spark  20000   30days         Spark    2000.0
r2      PySpark  25000   40days           NaN       NaN
r3       Python  22000   35days        Python    1200.0
r4       pandas  30000   50days           NaN       NaN

Pandas merge() Two DataFrames

In this section, I will explain the usage of pandas DataFrames using merge() method. This method is the most efficient way to join DataFrames on columns. It also supports joining on the index but an efficient way would be to use join().

Using merge() you can merge by columns, by index, you can perform merges based on indices, multiple columns, and various types of joins. By default, it merges on all shared columns between the DataFrames, conducting an inner join.


# Pandas.merge()
df3=pd.merge(df1,df2)

# DataFrame.merge()
df3=df1.merge(df2)

Yields below output.


# Output:
  Courses    Fee Duration  Discount
0   Spark  20000   30days      2000
1  Python  22000   35days      1200

You can alternatively explicitly specify the column names.


# Merge dataframes by columns
df3=pd.merge(df1,df2, on='Courses')

If you wanted to merge columns with different names across two pandas DataFrames.


# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses')
print(df3)

merge() function also accommodates various parameters; consult the pandas merge() for syntax, usage, and examples.

Pandas concat() Two DataFrames

pandas concat() method is the least used to join two DataFrames. It is mainly used to append DataFrames Rows. however, it can also be used to join pandas DataFrames.


# By using concat()
df3=pd.concat([df1,df2],axis=1,join='inner')
print(df3)

Yields below output.


# Output:
  Courses    Fee Duration  Discount
0   Spark  20000   30days      2000
1  Python  22000   35days      1200

Complete Examples of Pandas Join two DataFrames


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
              }
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)

technologies2 = {
    'Courses':["Spark","Java","Python","Go"],
    'Discount':[2000,2300,1200,2000]
              }
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)

print(df1)
print(df2)

# Pandas join two DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right")
print(df3)

# Pandas.merge()
df3=pd.merge(df1,df2)

# DataFrame.merge()
df3=df1.merge(df2)

# Merge DataFrames by Columns
df3=pd.merge(df1,df2, on='Courses')

# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses')
print(df3)

# By using concat()
df3=pd.concat([df1,df2],axis=1,join='inner')
print(df3)

FAQs on Join Pandas Two DataFrames

How can I join two DataFrames?

You can use pd.merge(), df.join(), or df.concat() functions in Pandas to join two DataFrames. The choice of method depends upon the type of join you want to perform.

What is the difference between merging and joining in Pandas?

In Pandas, merging and joining essentially perform the same operation of combining two DataFrames based on common columns. However, “merge” is the preferred term and function (pd.merge()), while “join” typically refers to specific types of merges, such as SQL-style joins (df.join()).

How do I merge two DataFrames with different column names?

If the DataFrames have different column names but represent the same data, you can use the left_on right_on parameters in the merge() function to specify the columns to join on explicitly.

Conclusion

In conclusion, this article has provided insights into merging two pandas DataFrames through join(), merge(), and concat() methods, accompanied by practical examples.

References