Pandas Join Two DataFrames

  • Post author:
  • Post category:Pandas / Python
  • Post last modified:January 27, 2022

pandas support several methods to join two DataFrames similar to SQL joins to combine columns. In this article, I will explain how to join two DataFrames using merge(), join(), and concat() methods. Each of these methods provides different ways to join DataFrames. merge() is the most used approach to join two DataFrames by columns and index. join() is primarily used to combine on index and concat() is used to append DataFrame rows but it can also be used to join.

1. Quick Examples of Pandas Join Two DataFrames

Below are some quick examples of pandas joining two DataFrames.


# pandas join two DataFrames by Index
df3=df1.join(df2, lsuffix="_left", rsuffix="_right")

# pandas.merge() by Column
df3=pd.merge(df1,df2)

# DataFrame.merge() by Column
df3=df1.merge(df2)

# Merge DataFrames by Column
df3=pd.merge(df1,df2, on='Courses')

# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses')

# By using concat()
df3=pd.concat([df1,df2],axis=1,join='inner')

First, let’s create a DataFrame that I can use to demonstrate with examples


# Create DataFrames
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
              }
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)

technologies2 = {
    'Courses':["Spark","Java","Python","Go"],
    'Discount':[2000,2300,1200,2000]
              }
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)

print(df1)
print(df2)

Yields below output

pandas join two DataFrames

2. Pandas join() Two DataFrames

panads.DataFrame.join() method can be used to combine two DataFrames on row indices. This by default does the left join and provides a way to specify the different join types. It supports left, inner, right, and outer join types. It also supports different params, refer to pandas join() for syntax, usage, and more examples.

By default, it uses left join on the row index. This is unlike merge() where it does inner join on common columns.


# pandas join two DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right")
print(df3)

Yields below output.


   Courses_left    Fee Duration Courses_right  Discount
r1        Spark  20000   30days         Spark    2000.0
r2      PySpark  25000   40days           NaN       NaN
r3       Python  22000   35days        Python    1200.0
r4       pandas  30000   50days           NaN       NaN

3. Pandas merge() Two DataFrames

In this section, I will explain the usage of pandas DataFrames using merge() method. This method is the most efficient way to join DataFrames on columns. It also supports joining on the index but an efficient way would be to use join().

Using merge() you can do merging by columns, merging by index, merging on multiple columns, and different join types. By default, it joins on all common columns that exist on both DataFrames and performs an inner join.


# pandas.merge()
df3=pd.merge(df1,df2)

# DataFrame.merge()
df3=df1.merge(df2)

Yields below output.


  Courses    Fee Duration  Discount
0   Spark  20000   30days      2000
1  Python  22000   35days      1200

You can also specify the column names explicitly.


# Merge DataFrames by Columns
df3=pd.merge(df1,df2, on='Courses')

In case if you wanted to combine column names that are different on two pandas DataFrames.


# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses')
print(df3)

merge() also supports different params, refer to pandas merge() to learn syntax, usage with examples.

4. pandas concat() Two DataFrames

pandas concat() method is the least used to join two DataFrames. It is mainly used to append DataFrames Rows. however, it can also be used to join pandas DataFrames.


# By using concat()
df3=pd.concat([df1,df2],axis=1,join='inner')
print(df3)

Yields below output.

5

  Courses    Fee Duration  Discount
0   Spark  20000   30days      2000
1  Python  22000   35days      1200

5. Complete Examples of Pandas Joins Two DataFrames


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
              }
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)

technologies2 = {
    'Courses':["Spark","Java","Python","Go"],
    'Discount':[2000,2300,1200,2000]
              }
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)

print(df1)
print(df2)

# pandas join two DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right")
print(df3)

# pandas.merge()
df3=pd.merge(df1,df2)

# DataFrame.merge()
df3=df1.merge(df2)

# Merge DataFrames by Columns
df3=pd.merge(df1,df2, on='Courses')

# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses')
print(df3)

# By using concat()
df3=pd.concat([df1,df2],axis=1,join='inner')
print(df3)

Conclusion

In this article, you have learned joining two DataFrames using join(), merge(), and concat() methods with explanation and examples.

References

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing Pandas Join Two DataFrames