In this article you will learn the difference between pandas join() vs merge() methods on pandas DataFrames with examples and use cases of each. pandas provide these two methods to combine DataFrames on columns and indexes however there are some differences between these two.
Pandas Join vs Merge Key Points
- join() method is used to perform join on row indices and doesn’t support joining on columns unless setting column as index.
- join() by default performs left join.
- merge() method is used to perform join on indices, columns and combination of these two.
- merge() by default performs inner join.
- Both these methods support inner, left, right, outer join types. merge additionally supports the cross join.
Pandas Join vs Merge Differences
Both methods join() and merge() are used to perform joining pandas DataFrames on columns meaning, it combines all columns from two or more DataFrames into a single DataFrame. The main difference between join vs merge would be; join() is used to combine two DataFrames on the index but not on columns whereas merge() is primarily used to specify the columns you wanted to join on, this also supports joining on indexes and combination of index and columns.
Both these methods support left on the column and right on the index however, merge additionally supports left on the index and right on the column.
There is another difference, join() by default performs left join whereas merge() default uses inner join.
Below are some similarities and differences between pandas join() vs merge() methods.
Join Feature | join() | merge() |
---|---|---|
inner | Yes | Yes |
left | Yes | Yes |
right | Yes | Yes |
outer | Yes | Yes |
cross | X | Yes |
Join on Indices | Yes | Yes |
Join on Columns | X | Yes |
Left on column, right on index | Yes | Yes |
Left on index, right on column | X | Yes |
2. Pandas Join vs Merge Examples
Now, let’s explore the differences between join vs merge with examples. First, let’s create two DataFrames with one common column on both DataFrames.
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
}
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)
technologies2 = {
'Courses':["Spark","Java","Python","Go"],
'Discount':[2000,2300,1200,2000]
}
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)
print(df1)
print(df2)
Yields below DataFrames


3. Join on Index vs Merge on Index
As I said above, join is used to perform combining DataFrames on row indexes, it is the most efficient way to use for indices. let’s see with an example.
# pandas left join two DataFrames by Index
df3=df1.join(df2, lsuffix="_left", rsuffix="_right", how='left')
Yields below output. Notice that it performed left join on two DataFrames as left join is the default behavior of the join method.

In order to do the same operation with merge(), you need explicitly specify the left
join using how
param.
# Merge two DataFrames by index using pandas.merge()
df3 = pd.merge(df1, df2, left_index=True, right_index=True, how='left')
print(df3)
Yields below output.

4. Join on Column vs Merge on Column
merge() allows us to use columns in order to combine DataFrames and by default, it uses inner join. Below example by default join on Courses
column as this is the only common column in both DataFrames.
# pandas merge - inner join by Column
df3=pd.merge(df1,df2)
Yields below output.

You can also explicitly specify the column you wanted to join on
# Merge DataFrames by Column
df3=pd.merge(df1,df2, on='Courses')
if you have column names different on left and right DataFrames, you can specify the column names
# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses')
To perform a similar operation with the join() method, you need to set the column to index usingset_index().
# Use join on column
df3=df1.set_index('Courses').join(df2.set_index('Courses'), how='inner')
print(df3)
5. Left on Column, Right on Index
6. Left on Index, Right on Column
7. Complete Example
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
}
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)
technologies2 = {
'Courses':["Spark","Java","Python","Go"],
'Discount':[2000,2300,1200,2000]
}
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)
print(df1)
print(df2)
# pandas left join two DataFrames by Index
df3=df1.join(df2, lsuffix="_left", rsuffix="_right", how='left')
# Merge two DataFrames by index using pandas.merge()
df3 = pd.merge(df1, df2, left_index=True, right_index=True, how='left')
print(df3)
# pandas merge - inner join by Column
df3=pd.merge(df1,df2)
print(df3)
# Merge DataFrames by Column
df3=pd.merge(df1,df2, on='Courses')
print(df3)
# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses')
print(df3)
# Use join on column
df3=df1.set_index('Courses').join(df2.set_index('Courses'), how='inner')
print(df3)
8. Conclusion
In this article, you have learned key differences between pandas join and merge. join() method is used to perform join on row indices and doesn’t support joining on columns unless setting column as an index. join() by default performs left join. merge() method is used to perform join on indices, columns and combination of these two. merge() by default performs inner join. Both these methods support inner, left, right, outer join types. merge additionally supports cross join.
Related Articles
- Pandas Merge Two DataFrames
- Pandas Join Two DataFrames
- Pandas Left Join Explained By Examples
- Pandas Merge DataFrames Explained Examples
- Pandas Concat Two DataFrames Explained
- Pandas Merge DataFrames on Index
- Pandas Add Header Row to DataFrame
- Pandas Add Column Names to DataFrame
- Pandas Outer Join Explained By Examples