pandas join() is similar to SQL join where it combines columns from multiple DataFrames based on row indices. In pandas join can be done only on indexes but not on columns. By default, it uses left join on the row index. If you wanted to join on columns you should use pandas.merge() method as this by default performs on columns
Pandas Join Key Points
- You can join pandas Dataframes similar to joining tables in SQL.
- pandas.DataFrame.join() method is used to join DataFrames.
- By default it uses left join on row index. This is unlike merge() where it does inner join on common columns.
- pandas also supports other methods like concat() and merge() to join DataFrames.
1. Quick Examples of Pandas Join
Below are some quick examples of pandas join() method
# Below are quick examples on join() method
# Pandas default join
df3=df1.join(df2, lsuffix="_left", rsuffix="_right")
# Pandas Inner join DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right", how='inner')
# Pandas Right join DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right", how='right')
# Pandas outer join DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right", how='outer')
# Pandas join on columns
df3=df1.set_index('Courses').join(df2.set_index('Courses'), how='inner')
Now, Let’s learn the syntax, usage, and execute these examples of pandas join.
2. Pandas Join Syntax & Usage
Below is the syntax of the pandas join() method.
# Syntax of pandas.DataFrame.join() method
DataFrame.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)
other
– Pass right DataFrame object or list of DataFrame objects.on
– Specify which index you wante to join on when you have multiple indexes.how
– Use to specify the join type. Accepts inner, left, right, outer.lsuffix
– Specify the left suffix string to column namesrsuffix
– Specify the right suffix string to column namessort
– To specify the results to be sorted.
3. Pandas Join DataFrames Example
pandas join() method by default performs left join on row index. Let’s create two DataFrames and run the above examples to understand pandas join.
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
}
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)
technologies2 = {
'Courses':["Spark","Java","Python","Go"],
'Discount':[2000,2300,1200,2000]
}
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)
Yields below output.

Now, perform the default join operation. Since we have a common column Courses on both DataFrames we have to specify the lsuffix
(left suffix) or rsuffix
(right suffix) param, not specifying these you will get an error.
# Pandas join
df3=df1.join(df2, lsuffix="_left", rsuffix="_right")
print(df3)
Yields below output. As specified by default it uses left join and performs join on row index.
# Output:
Courses_left Fee Duration Courses_right Discount
r1 Spark 20000 30days Spark 2000.0
r2 PySpark 25000 40days NaN NaN
r3 Python 22000 35days Python 1200.0
r4 pandas 30000 50days NaN NaN
4. Inner Join DataFrames
pandas inner
join is mostly used join, It is used to join two DataFrames on indexes. When indexes don’t match the rows get dropped from both DataFrames.
# Pandas Inner join DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right", how='inner')
print(df3)
# Outputs:
# Courses_left Fee Duration Courses_right Discount
# r1 Spark 20000 30days Spark 2000
# r3 Python 22000 35days Python 1200
When we apply Inner join on our DataFrames, It drops DataFrame rows with indexes r2 and r4 from left DataFrame and drops indexes r5 and r6 from right DataFrame as these don’t have matches.
5. Right Join DataFrames
Also called Right Outer Join
– This join is the opposite of left
join, here it returns all rows from the right DataFrame regardless of math found on the left. When the join expression doesn’t match, it assigns null for that record and drops records from left where match not found.
# Pandas Right join DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right", how='right')
print(df3)
# Outputs:
# Courses_left Fee Duration Courses_right Discount
# r1 Spark 20000.0 30days Spark 2000
# r6 NaN NaN NaN Java 2300
# r3 Python 22000.0 35days Python 1200
# r5 NaN NaN NaN Go 2000
When we apply right join on our DataFrames, It gets all rows from right DataFrame but it assigns NaN to left DataFrame columns for rows that don’t have the match on right DataFrame.
6. Outer Join DataFrames
Also called Full Outer Join
– Returns all rows from both DataFrames. Where join expression doesn’t match it returns null on respective cells.
# Pandas outer join DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right", how='outer')
print(df3)
# Outputs:
# Courses_left Fee Duration Courses_right Discount
# r1 Spark 20000.0 30days Spark 2000.0
# r2 PySpark 25000.0 40days NaN NaN
# r3 Python 22000.0 35days Python 1200.0
# r4 pandas 30000.0 50days NaN NaN
# r5 NaN NaN NaN Go 2000.0
# r6 NaN NaN NaN Java 2300.0
When we apply outer join on our DataFrames, It gets all rows from left and right but it assigns NaN to left and right DataFrame columns for rows that don’t have a match.
7. Pandas Join on Column
In case if you wanted to join on columns, use pandas.merge() method or set the column you wanted to join on to Index and use it. The below example demonstrates how to set the column to Index in pandas and use it for joining. df1.set_index(‘Courses’) is used to convert column to index.
# Pandas join on columns
df3=df1.set_index('Courses').join(df2.set_index('Courses'), how='inner')
print(df3)
# Outputs:
# Fee Duration Discount
# Courses
# Spark 20000 30days 2000
# Python 22000 35days 1200
You can also try the below example where it persists the original index
# Pandas join
df3=df1.join(df2.set_index('Courses'), how='inner', on='Courses')
print(df3)
8. Complete Example
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
}
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)
technologies2 = {
'Courses':["Spark","Java","Python","Go"],
'Discount':[2000,2300,1200,2000]
}
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)
print(df1)
print(df2)
df3=df1.join(df2, lsuffix="_left", rsuffix="_right")
print(df3)
# Pandas Inner join DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right", how='inner')
print(df3)
# Pandas Right join DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right", how='right')
print(df3)
# Pandas outer join DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right", how='outer')
print(df3)
# Pandas join on column
df3=df1.set_index('Courses').join(df2.set_index('Courses'), how='inner')
print(df3)
# Pandas join
df3=df1.join(df2.set_index('Courses'), how='inner', on='Courses')
print(df3)
Conclusion
In this article, you have learned to join DataFrames using join() method. By default join() method performs join on row indexes by using left join.