Pandas Outer Join Explained By Examples

  • Post author:
  • Post category:Pandas
  • Post last modified:December 8, 2023
  • Reading time:13 mins read

In this Pandas outer join article, I will explain the significance of outer join and how to do it with Pandas DataFrames on single/multiple columns. Since Pandas supports several methods to combine DataFrames, I will cover all of these with examples.

Outer join also called Full Outer Join combines rows from two or more DataFrames based on a common column and includes all the rows from both DataFrames, filling in missing values with NaN where there is no match.

1. Quick Examples of Pandas Outer Join DataFrames

Below are some quick examples of the Pandas full Outer Join DataFrames.


# Below are some quick examples.

# Example 1: Pandas outer join two DataFrames by Index
df3 = df1.join(df2, lsuffix="_left", rsuffix="_right", how='outer')

# Example 2: Pandas.merge() by Column
df3 = pd.merge(df1,df2, how='outer')

# Example 3: DataFrame.merge() by Column
df3 = df1.merge(df2, how='outer')

# Example 4: Merge DataFrames by Column
df3 = pd.merge(df1,df2, on='Courses', how='outer')

# Example 5: When column names are different
df3 = pd.merge(df1,df2, left_on='Courses', right_on='Courses', how='outer')

# Example 6: By using concat()
df3 = pd.concat([df1,df2],axis=1,join='outer')

First, let’s create a DataFrames with one column in common on both DataFrames. I will use these to demonstrate Outer Join with examples.


# Create DataFrames
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
              }
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)

technologies2 = {
    'Courses':["Spark","Java","Python","Go"],
    'Discount':[2000,2300,1200,2000]
              }
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)

print("Create DataFrame1:\n:", df1)
print("Create DataFrame2:\n", df2)

Yields below output.

pandas outer join dataframes

2. Pandas Outer Join using join()

panads.DataFrame.join() method supports how parameter to join DataFrames by outer join. It also supports different parameters, refer to pandas join() for syntax, usage, and more examples.


# Pandas join two DataFrames
df3 = df1.join(df2, lsuffix="_left", rsuffix="_right", how='outer')
print("After joining DataFrames by using outer join:\n", df3)

Yields below output.

pandas outer join dataframes

3. Outer Join Using merge()

Using the merge() function you can do merging by columns, merging by index, merging on multiple columns, and different join types. By default, it joins on all common columns that exist on both DataFrames and performs an inner join, to do an outer join use how param with outer value.


# Pandas.merge()
df3=pd.merge(df1,df2, how='outer')
print("After joining DataFrames by using outer join:\n", df3)

# DataFrame.merge()
df3=df1.merge(df2, how='outer')
print("After joining DataFrames by using outer join:\n", df3)

Yields below output.


# Output:
After joining DataFrames by using outer join:
    Courses      Fee Duration  Discount
0    Spark  20000.0   30days    2000.0
1  PySpark  25000.0   40days       NaN
2   Python  22000.0   35days    1200.0
3   pandas  30000.0   50days       NaN
4     Java      NaN      NaN    2300.0
5       Go      NaN      NaN    2000.0

You can also specify the column names explicitly.


# Merge DataFrames by Columns
df3=pd.merge(df1,df2, on='Courses', how='outer')
print("After joining DataFrames by using outer join:\n", df3)

Yields the same output as above.

In case, if you want to combine column names that are different on two pandas DataFrames.


# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses', how='outer')
print("After joining DataFrames by using outer join:\n", df3)

merge() also supports different params, refer to pandas merge() to learn syntax, usage with examples.

4. By using pandas concat()

pandas concat() method is the least used to join two DataFrames. It is mainly used to append DataFrames Rows. however, it can also be used to join pandas DataFrames and specify outer for join param to perform the outer join.


# By using concat()
df3=pd.concat([df1,df2],axis=1,join='outer')
print("After joining DataFrames by using outer join:\n", df3)

Yields below output.


# Output:
# After joining DataFrames by using outer join:
    Courses      Fee Duration Courses  Discount
r1    Spark  20000.0   30days   Spark    2000.0
r2  PySpark  25000.0   40days     NaN       NaN
r3   Python  22000.0   35days  Python    1200.0
r4   pandas  30000.0   50days     NaN       NaN
r6      NaN      NaN      NaN    Java    2300.0
r5      NaN      NaN      NaN      Go    2000.0

5. Complete Examples of Outer Join of DataFrames


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
              }
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)

technologies2 = {
    'Courses':["Spark","Java","Python","Go"],
    'Discount':[2000,2300,1200,2000]
              }
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)

print(df1)
print(df2)

# Pandas join two DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right", how='outer')
print(df3)

# Pandas.merge()
df3=pd.merge(df1,df2, how='outer')

# DataFrame.merge()
df3=df1.merge(df2, how='outer')

# Merge DataFrames by Columns
df3=pd.merge(df1,df2, on='Courses', how='outer')

# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses', how='outer')
print(df3)

# By using concat()
df3=pd.concat([df1,df2],axis=1,join='outer')
print(df3)

Frequently Asked Questions of Outer Join of DataFrames

What is an outer join in the context of DataFrames?

An outer join combines rows from two DataFrames based on a common column, including all rows from both DataFrames. It includes matching rows as well as non-matching rows.

How do I perform an outer join in Pandas?

You can use the merge() function to a DataFrame with the parameter how='outer' to perform an outer join. For example, merged_df = df1.merge(df2, how='outer')

What is the difference between inner join and outer join?

An inner join includes only the rows that have matching columns in both DataFrames, while an outer join includes all rows from both DataFrames, filling in NaN (or null) values for non-matching rows.

What happens to non-matching rows in an outer join?

Non-matching rows in an outer join will have NaN (or null) values in the columns from the DataFrame that don’t have a match for a particular column.

How can I perform an outer join on multiple keys?

You can perform an outer join on multiple keys by passing a list of key columns to the on parameter in the merge() function. For example, merged_df = df1.merge(df2, how='outer', on=['key1', 'key2'])

Conclusion

In this article, you have learned how to perform an outer join on DataFrams by using join(), merge() and concat() methods with explanations and examples. An outer join is also called Full Outer Join that returns all rows from both DataFrames. Where the join expression doesn’t match it returns null on respective cells.

References

Naveen (NNK)

Naveen (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply