pandas support several methods to join two DataFrames similar to SQL joins to combine columns. In this article, I will explain how to join two DataFrames using merge()
, join()
, and concat()
methods. Each of these methods provides different ways to join DataFrames. merge() is the most used approach to join two DataFrames by columns and index. join() is primarily used to combine on index and concat() is used to append DataFrame rows but it can also be used to join.
1. Quick Examples of Pandas Join Two DataFrames
Below are some quick examples of pandas joining two DataFrames.
# Below are quick examples
# Pandas join two DataFrames by Index
df3=df1.join(df2, lsuffix="_left", rsuffix="_right")
# Pandas.merge() by Column
df3=pd.merge(df1,df2)
# DataFrame.merge() by Column
df3=df1.merge(df2)
# Merge DataFrames by Column
df3=pd.merge(df1,df2, on='Courses')
# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses')
# By using concat()
df3=pd.concat([df1,df2],axis=1,join='inner')
First, let’s create a DataFrame that I can use to demonstrate with examples
# Create DataFrames
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
}
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)
technologies2 = {
'Courses':["Spark","Java","Python","Go"],
'Discount':[2000,2300,1200,2000]
}
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)
print(df1)
print(df2)
Yields below output

2. Pandas join() Two DataFrames
panads.DataFrame.join() method can be used to combine two DataFrames on row indices. This by default does the left join and provides a way to specify the different join types. It supports left
, inner
, right
, and outer
join types. It also supports different params, refer to pandas join() for syntax, usage, and more examples.
By default, it uses left join on the row index. This is unlike merge() where it does inner join on common columns.
# Pandas join two DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right")
print(df3)
Yields below output.
# Output:
Courses_left Fee Duration Courses_right Discount
r1 Spark 20000 30days Spark 2000.0
r2 PySpark 25000 40days NaN NaN
r3 Python 22000 35days Python 1200.0
r4 pandas 30000 50days NaN NaN
3. Pandas merge() Two DataFrames
In this section, I will explain the usage of pandas DataFrames using merge() method. This method is the most efficient way to join DataFrames on columns. It also supports joining on the index but an efficient way would be to use join().
Using merge() you can do merging by columns, merging by index, merging on multiple columns, and different join types. By default, it joins on all common columns that exist on both DataFrames and performs an inner join.
# Pandas.merge()
df3=pd.merge(df1,df2)
# DataFrame.merge()
df3=df1.merge(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
0 Spark 20000 30days 2000
1 Python 22000 35days 1200
You can also specify the column names explicitly.
# Merge DataFrames by Columns
df3=pd.merge(df1,df2, on='Courses')
In case if you wanted to combine column names that are different on two pandas DataFrames.
# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses')
print(df3)
merge() also supports different params, refer to pandas merge() to learn syntax, usage with examples.
4. pandas concat() Two DataFrames
pandas concat() method is the least used to join two DataFrames. It is mainly used to append DataFrames Rows. however, it can also be used to join pandas DataFrames.
# By using concat()
df3=pd.concat([df1,df2],axis=1,join='inner')
print(df3)
Yields below output.
5
# Output:
Courses Fee Duration Discount
0 Spark 20000 30days 2000
1 Python 22000 35days 1200
5. Complete Examples of Pandas Joins Two DataFrames
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
}
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)
technologies2 = {
'Courses':["Spark","Java","Python","Go"],
'Discount':[2000,2300,1200,2000]
}
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)
print(df1)
print(df2)
# Pandas join two DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right")
print(df3)
# Pandas.merge()
df3=pd.merge(df1,df2)
# DataFrame.merge()
df3=df1.merge(df2)
# Merge DataFrames by Columns
df3=pd.merge(df1,df2, on='Courses')
# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses')
print(df3)
# By using concat()
df3=pd.concat([df1,df2],axis=1,join='inner')
print(df3)
Conclusion
In this article, you have learned joining two DataFrames using join(), merge(), and concat() methods with explanation and examples.
Related Articles
- Differences between Pandas Join vs Merge
- Pandas Join DataFrames on Columns
- Pandas Join Explained With Examples
- Pandas Outer Join Explained By Examples
- Pandas Left Join Explained By Examples
- Pandas Percentage Total With Groupby
- Pandas Merge Multiple DataFrames
- Pandas Merge Two DataFrames