Pandas Left Join Explained By Examples

In this pandas left join article, I will explain the significance of left join and how to do it with pandas DataFrames on multiple columns. Since pandas support several methods to combine DataFrames, I will cover all these with examples.

Left join is also called Left Outer Join that returns all rows from the left DataFrame regardless of match found on the right DataFrame. When the join expression doesn’t match, it assigns null for that record for left records and drops records from right where match not found.

1. Quick Examples of Pandas Left Join of DataFrames

Below are some quick examples of pandas Left Join DataFrames.


# pandas left join two DataFrames by Index
df3=df1.join(df2, lsuffix="_left", rsuffix="_right", how='left')

# pandas.merge() by Column
df3=pd.merge(df1,df2, how='left')

# DataFrame.merge() by Column
df3=df1.merge(df2, how='left')

# Merge DataFrames by Column
df3=pd.merge(df1,df2, on='Courses', how='left')

# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses', how='left')

First, let’s create a DataFrames that I can use to demonstrate Left Join with examples


# Create DataFrames
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
              }
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)

technologies2 = {
    'Courses':["Spark","Java","Python","Go"],
    'Discount':[2000,2300,1200,2000]
              }
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)

print(df1)
print(df2)

Yields below output

pandas left join dataframes

2. Pandas Left Join using join()

panads.DataFrame.join() method by default does the leftt Join on row indices and provides a way to do join on other join types. It also supports different params, refer to pandas join() for syntax, usage, and more examples.


# pandas join two DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right")
print(df3)

Yields below output.


   Courses_left    Fee Duration Courses_right  Discount
r1        Spark  20000   30days         Spark    2000.0
r2      PySpark  25000   40days           NaN       NaN
r3       Python  22000   35days        Python    1200.0
r4       pandas  30000   50days           NaN       NaN

If you check the results, indexes r2 and r4 are from the right DataFrame and it contains NaN for some columns for records that don’t match.

3. Left Join Using merge()

Using merge() you can do merging by columns, merging by index, merging on multiple columns, and different join types. By default, it joins on all common columns that exist on both DataFrames and performs an inner join. Use param how to specify the left join.


# pandas.merge()
df3=pd.merge(df1,df2, how='left')
print(df3)

# DataFrame.merge()
df3=df1.merge(df2, how='left')
print(df3)

Yields below output.


   Courses    Fee Duration  Discount
0    Spark  20000   30days    2000.0
1  PySpark  25000   40days       NaN
2   Python  22000   35days    1200.0
3   pandas  30000   50days       NaN

You can also specify the column names explicitly.


# Merge DataFrames by Columns
df3=pd.merge(df1,df2, on='Courses', how='left')

In case if you wanted to combine column names that are different on two pandas DataFrames.


# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses', how='left')
print(df3)

merge() also supports different params, refer to pandas merge() to learn syntax, usage with examples.

4. Complete Examples


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
              }
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)

technologies2 = {
    'Courses':["Spark","Java","Python","Go"],
    'Discount':[2000,2300,1200,2000]
              }
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)

print(df1)
print(df2)

# pandas join two DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right")
print(df3)

# pandas.merge()
df3=pd.merge(df1,df2, how='left')
print(df3)

# DataFrame.merge()
df3=df1.merge(df2, how='left')
print(df3)

# Merge DataFrames by Column
df3=pd.merge(df1,df2, on='Courses', how='left')

# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses', how='left')

Conclusion

In this article, you have learned how to perform a left join on DataFrams by using join() and merge() methods with explanations and examples. A left join is also called Left Outer Join which returns all rows from the left DataFrame regardless of match found on the right DataFrame. When the join expression doesn’t match, it assigns null for that record for left records and drops records from right where match not found.

References

pandas left join

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing Pandas Left Join Explained By Examples