In this pandas outer join article, I will explain the significance of outer join and how to do it with pandas DataFrames on multiple columns. Since pandas support several methods to combine DataFrames, I will cover all these with examples.
Outer join is also called Full Outer Join
that returns all rows from both pandas DataFrames. Where join expression doesn’t match it returns null on respective cells.
1. Quick Examples of Pandas Outer Join DataFrames
Below are some quick examples of pandas Full Outer Join DataFrames.
# Below are some quick examples.
# Pandas outer join two DataFrames by Index
df3=df1.join(df2, lsuffix="_left", rsuffix="_right", how='outer')
# Pandas.merge() by Column
df3=pd.merge(df1,df2, how='outer')
# DataFrame.merge() by Column
df3=df1.merge(df2, how='outer')
# Merge DataFrames by Column
df3=pd.merge(df1,df2, on='Courses', how='outer')
# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses', how='outer')
# By using concat()
df3=pd.concat([df1,df2],axis=1,join='outer')
First, let’s create a DataFrames with one column in common on both DataFrames. I will use these to demonstrate Outer Join with examples
# Create DataFrames
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
}
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)
technologies2 = {
'Courses':["Spark","Java","Python","Go"],
'Discount':[2000,2300,1200,2000]
}
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)
print(df1)
print(df2)
Yields below output

2. Pandas Outer Join using join()
panads.DataFrame.join() method supports how param to join DataFrames by outer join. It also supports different params, refer to pandas join() for syntax, usage, and more examples.
# Pandas join two DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right", how='outer')
print(df3)
Yields below output.
# Output:
Courses_left Fee Duration Courses_right Discount
r1 Spark 20000.0 30days Spark 2000.0
r2 PySpark 25000.0 40days NaN NaN
r3 Python 22000.0 35days Python 1200.0
r4 pandas 30000.0 50days NaN NaN
r5 NaN NaN NaN Go 2000.0
r6 NaN NaN NaN Java 2300.0
3. Outer Join Using merge()
Using merge() you can do merging by columns, merging by index, merging on multiple columns, and different join types. By default, it joins on all common columns that exist on both DataFrames and performs an inner join, to do an outer join use how
param with outer
value.
# Pandas.merge()
df3=pd.merge(df1,df2, how='outer')
# DataFrame.merge()
df3=df1.merge(df2, how='outer')
Yields below output.
# Output:
Courses Fee Duration Discount
0 Spark 20000 30days 2000
1 Python 22000 35days 1200
You can also specify the column names explicitly.
# Merge DataFrames by Columns
df3=pd.merge(df1,df2, on='Courses', how='outer')
In case if you wanted to combine column names that are different on two pandas DataFrames.
# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses', how='outer')
print(df3)
merge() also supports different params, refer to pandas merge() to learn syntax, usage with examples.
4. By using pandas concat()
pandas concat() method is the least used to join two DataFrames. It is mainly used to append DataFrames Rows. however, it can also be used to join pandas DataFrames and specify outer
for join
param to perform the outer join.
# By using concat()
df3=pd.concat([df1,df2],axis=1,join='outer')
print(df3)
Yields below output.
# Output:
Courses Fee Duration Courses Discount
r1 Spark 20000.0 30days Spark 2000.0
r2 PySpark 25000.0 40days NaN NaN
r3 Python 22000.0 35days Python 1200.0
r4 pandas 30000.0 50days NaN NaN
r6 NaN NaN NaN Java 2300.0
r5 NaN NaN NaN Go 2000.0
5. Complete Examples of Outer Join of DataFrames
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
}
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)
technologies2 = {
'Courses':["Spark","Java","Python","Go"],
'Discount':[2000,2300,1200,2000]
}
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)
print(df1)
print(df2)
# Pandas join two DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right", how='outer')
print(df3)
# Pandas.merge()
df3=pd.merge(df1,df2, how='outer')
# DataFrame.merge()
df3=df1.merge(df2, how='outer')
# Merge DataFrames by Columns
df3=pd.merge(df1,df2, on='Courses', how='outer')
# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses', how='outer')
print(df3)
# By using concat()
df3=pd.concat([df1,df2],axis=1,join='outer')
print(df3)
Conclusion
In this article, you have learned how to perform an outer join on DataFrams by using join(), merge() and concat() methods with explanations and examples. An outer join is also called Full Outer Join
that returns all rows from both DataFrames. Where join expression doesn’t match it returns null on respective cells.
Related Articles
- Pandas Merge DataFrames Explained Examples
- Pandas Merge Multiple DataFrames
- Pandas Add Column Names to DataFrame
- Differences between Pandas Join vs Merge
- Pandas Left Join Explained By Examples
- Pandas Merge Two DataFrames
- Pandas Join Two DataFrames
- Pandas Join Explained With Examples