In this Pandas outer join article, I will explain the significance of outer join and how to do it with Pandas DataFrames on single/multiple columns. Since Pandas supports several methods to combine DataFrames, I will cover all of these with examples.
Outer join also called Full Outer Join combines rows from two or more DataFrames based on a common column and includes all the rows from both DataFrames, filling in missing values with NaN where there is no match.
1. Quick Examples of Pandas Outer Join DataFrames
Below are some quick examples of the Pandas full Outer Join DataFrames.
# Below are some quick examples.
# Example 1: Pandas outer join two DataFrames by Index
df3 = df1.join(df2, lsuffix="_left", rsuffix="_right", how='outer')
# Example 2: Pandas.merge() by Column
df3 = pd.merge(df1,df2, how='outer')
# Example 3: DataFrame.merge() by Column
df3 = df1.merge(df2, how='outer')
# Example 4: Merge DataFrames by Column
df3 = pd.merge(df1,df2, on='Courses', how='outer')
# Example 5: When column names are different
df3 = pd.merge(df1,df2, left_on='Courses', right_on='Courses', how='outer')
# Example 6: By using concat()
df3 = pd.concat([df1,df2],axis=1,join='outer')
First, let’s create a DataFrames with one column in common on both DataFrames. I will use these to demonstrate Outer Join with examples.
# Create DataFrames
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
}
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)
technologies2 = {
'Courses':["Spark","Java","Python","Go"],
'Discount':[2000,2300,1200,2000]
}
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)
print("Create DataFrame1:\n:", df1)
print("Create DataFrame2:\n", df2)
Yields below output.
2. Pandas Outer Join using join()
panads.DataFrame.join() method supports how
parameter to join DataFrames by outer join. It also supports different parameters, refer to pandas join() for syntax, usage, and more examples.
# Pandas join two DataFrames
df3 = df1.join(df2, lsuffix="_left", rsuffix="_right", how='outer')
print("After joining DataFrames by using outer join:\n", df3)
Yields below output.
3. Outer Join Using merge()
Using the merge() function you can do merging by columns, merging by index, merging on multiple columns, and different join types. By default, it joins on all common columns that exist on both DataFrames and performs an inner join, to do an outer join use how
param with outer
value.
# Pandas.merge()
df3=pd.merge(df1,df2, how='outer')
print("After joining DataFrames by using outer join:\n", df3)
# DataFrame.merge()
df3=df1.merge(df2, how='outer')
print("After joining DataFrames by using outer join:\n", df3)
Yields below output.
# Output:
After joining DataFrames by using outer join:
Courses Fee Duration Discount
0 Spark 20000.0 30days 2000.0
1 PySpark 25000.0 40days NaN
2 Python 22000.0 35days 1200.0
3 pandas 30000.0 50days NaN
4 Java NaN NaN 2300.0
5 Go NaN NaN 2000.0
You can also specify the column names explicitly.
# Merge DataFrames by Columns
df3=pd.merge(df1,df2, on='Courses', how='outer')
print("After joining DataFrames by using outer join:\n", df3)
Yields the same output as above.
In case, if you want to combine column names that are different on two pandas DataFrames.
# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses', how='outer')
print("After joining DataFrames by using outer join:\n", df3)
merge() also supports different params, refer to pandas merge() to learn syntax, usage with examples.
4. By using pandas concat()
pandas concat() method is the least used to join two DataFrames. It is mainly used to append DataFrames Rows. however, it can also be used to join pandas DataFrames and specify outer
for join
param to perform the outer join.
# By using concat()
df3=pd.concat([df1,df2],axis=1,join='outer')
print("After joining DataFrames by using outer join:\n", df3)
Yields below output.
# Output:
# After joining DataFrames by using outer join:
Courses Fee Duration Courses Discount
r1 Spark 20000.0 30days Spark 2000.0
r2 PySpark 25000.0 40days NaN NaN
r3 Python 22000.0 35days Python 1200.0
r4 pandas 30000.0 50days NaN NaN
r6 NaN NaN NaN Java 2300.0
r5 NaN NaN NaN Go 2000.0
5. Complete Examples of Outer Join of DataFrames
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
}
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)
technologies2 = {
'Courses':["Spark","Java","Python","Go"],
'Discount':[2000,2300,1200,2000]
}
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)
print(df1)
print(df2)
# Pandas join two DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right", how='outer')
print(df3)
# Pandas.merge()
df3=pd.merge(df1,df2, how='outer')
# DataFrame.merge()
df3=df1.merge(df2, how='outer')
# Merge DataFrames by Columns
df3=pd.merge(df1,df2, on='Courses', how='outer')
# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses', how='outer')
print(df3)
# By using concat()
df3=pd.concat([df1,df2],axis=1,join='outer')
print(df3)
Frequently Asked Questions of Outer Join of DataFrames
An outer join combines rows from two DataFrames based on a common column, including all rows from both DataFrames. It includes matching rows as well as non-matching rows.
You can use the merge()
function to a DataFrame with the parameter how='outer'
to perform an outer join. For example, merged_df = df1.merge(df2, how='outer')
An inner join includes only the rows that have matching columns in both DataFrames, while an outer join includes all rows from both DataFrames, filling in NaN (or null) values for non-matching rows.
Non-matching rows in an outer join will have NaN (or null) values in the columns from the DataFrame that don’t have a match for a particular column.
You can perform an outer join on multiple keys by passing a list of key columns to the on
parameter in the merge()
function. For example, merged_df = df1.merge(df2, how='outer', on=['key1', 'key2'])
Conclusion
In this article, you have learned how to perform an outer join on DataFrams by using join(), merge() and concat() methods with explanations and examples. An outer join is also called Full Outer Join
that returns all rows from both DataFrames. Where the join expression doesn’t match it returns null on respective cells.
Related Articles
- Pandas Merge Multiple DataFrames
- Pandas Add Column Names to DataFrame
- Differences between Pandas Join vs Merge
- Pandas Left Join Explained By Examples
- Pandas Merge Two DataFrames