Pandas support pandas.merge()
and DataFrame.merge()
to merge DataFrames which is exactly similar to SQL join and supports different types of join inner
, left
, right
, outer
, cross
. By default, it uses inner join where keys don’t match the rows get dropped from both DataFrames, and the result DataFrame contains rows that match on both.
In this article, I will explain how to merge DataFrames with examples like merging by columns, merging by index, merging on multiple columns, and different join types. In case, if you want to merge on Indexes use pandas join() which default supports joining on the index.
Pandas Merge Key Points
- pandas.merge() and DataFrame.merge() work the same way that merges two or more DataFrames.
- This works similarly to Database joins
- Supports joins on rows and columns
- When doing a join on columns, it ignores indexes.
- When joining on the index, the resultant DataFrame contains indexes from sources.
- When no params are used, by default join happens on all common columns.
1. Quick Examples of pandas merge() DataFrames
Below are some quick examples of the pandas merge() method that merges two or multiple DataFrames.
# Quick Examples of Merging pandas DataFrames
# Example 1: Pandas.merge()
df3=pd.merge(df1,df2)
# Example 2: DataFrame.merge()
df3=df1.merge(df2)
# Example 3: Merge by column
df3=pd.merge(df1,df2, on='Courses')
# Example 4: Merge on different colunn names
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses')
# Example 5: Merge by Index
df3 = pd.merge(df1,df2,left_index=True,right_index=True)
# Example 6: Merge by multiple columns
df3 = pd.merge(df3, df1, how='left', left_on=['Col1','col2'], right_on = ['col1','col2'])
# Example 7: Merge by left join
df3=pd.merge(df1,df2, on='Courses', how='left')
# Example 8: Merge by right join
df3=pd.merge(df1,df2, on='Courses', how='right')
# Example 9: Merge by outer join
df3=pd.merge(df1,df2, on='Courses', how='outer')
2. Pandas Merge DataFrames
pandas.merge()
and DataFrame.merge()
are used to merge two DataFrames or multiple DataFrames. Both these methods work exactly the same and they also take a similar number of params. Merging DataFrames is nothing but joining DataFrames similar to Database join.
Similar to the Database join, the merge()
method also supports several join types like left
, right
, inner
, outer
and cross
. Use how
param to specify the join type.
By default merge()
uses inner join on columns that are present on both DataFrames. You can also explicitly specify the columns you want to join and join by row index.
Though DataFrame also has a join()
method there are slight differences between merge()
and join()
methods.
Now, let’s see syntax and usage with examples. By using merge you can also merge series into DataFrame.
2.1 pandas.merge() Syntax
Below is the syntax and usage of pandas.merge()
method. For the latest syntax refer to pandas.merge()
# Pandas.merge() Syntax
pandas.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)
2.2 pandas.DataFrame.merge() Syntax
Below is the syntax and usage of pandas.DataFrame.merge()
method.
# Pandas.DataFrame.merge() Syntax
DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)
Note: Both these merge()
methods are similar and take the same type of arguments.
3. Pandas Merge DataFrames Examples
Let’s create DataFrames of two to demonstrate how merge works.
# Create DataFrames
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
}
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)
print("First DataFrame:", df1)
technologies2 = {
'Courses':["Spark","Java","Python","Go"],
'Discount':[2000,2300,1200,2000]
}
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)
print("Second DataFrame:", df2)
Yields below output.
3.1 pandas.merge() DataFrames
In my first example of merge(), I will use default params where it does inner join on the same columns presented on both DataFrames. From our DataFrame the common column is Courses. If you want to concat DataFrames use pandas.concat() method.
# Using pandas.merge()
df3= pd.merge(df1,df2)
# Using DataFrame.merge()
df3=df1.merge(df2)
print("After merging the two DataFrames:\n", df3)
Yields below output.
3.2 Merge DataFrames on Columns
Sometimes it is required to specify the column names you want to merge, to do so use on
param. For example df3=pd.merge(df1,df2, on='Courses')
.
# Merge DataFrames by Columns
df3=pd.merge(df1,df2, on='Courses')
Yields the same output as above first example. In case if you have different columns on the left and right and want to join on these use left_on
and right_on
params.
# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses')
print(df3)
3.3 Merge DataFrames on Index
You can also Merge DataFrames by Index using left_index
and right_index
params. If you notice the above examples, it ignored the index from the merged DataFrame result. Joining on Index retains the indexes.
Note that you can also use the index from the left and columns from the right DataFrame and vice-versa to perform the merge.
# Merge DataFrames by Index
df3 = pd.merge(df1,df2,left_index=True,right_index=True)
print(df3)
Yields
# Output:
Courses_x Fee Duration Courses_y Discount
r1 Spark 20000 30days Spark 2000
r3 Python 22000 35days Python 1200
3.4 Merging DataFrames on Multiple Columns
Use param on with a list of column names when you want to merge DataFrames on multiple columns. It also supports different column names on the left and right DataFrames, below is an example.
# Use pandas.merge() to on multiple columns
df3 = pd.merge(df3, df1, how='left', left_on=['Courses','Fee'], right_on = ['Courses','Fee'])
print(df3)
# Output:
# Courses Fee Duration_x Discount Duration_y
# 0 Spark 20000 30days 2000 30days
# 1 Python 22000 35days 1200 35days
3.5 Merging Using Different Join Types
Merge is similar the SQL join hence, it supports different join types inner, left, right, and outer. By default, merge uses inner join. Let’s see other join types in this section.
Left Join
– It is used to perform left join on DataFrames, also called Left Outer Join that returns all rows from the left DataFrame regardless of the match found on the right DataFrame. When the join expression doesn’t match, it assigns null for that record and drops records from the right where the match is not found.
# Merge by left Join
df3=pd.merge(df1,df2, on='Courses', how='left')
print(df3)
# Output:
# Courses Fee Duration Discount
# 0 Spark 20000 30days 2000.0
# 1 PySpark 25000 40days NaN
# 2 Python 22000 35days 1200.0
# 3 pandas 30000 50days NaN
Right Join – Also called Right Outer Join
– This join is the opposite of left
join, here it returns all rows from the right DataFrame regardless of math found on the left. When the join expression doesn’t match, it assigns null for that record and drops records from the left where a match is not found.
# Merge by right Join
df3=pd.merge(df1,df2, on='Courses', how='right')
print(df3)
# Output:
# Courses Fee Duration Discount
# 0 Spark 20000.0 30days 2000
# 1 Java NaN NaN 2300
# 2 Python 22000.0 35days 1200
# 3 Go NaN NaN 2000
Outer Join
– It is used to perform outer join on DataFrames, Also called Full Outer Join
– Returns all rows from both DataFrames. Where the join expression doesn’t match it returns null on respective cells.
# Merge by outer Join
df3=pd.merge(df1,df2, on='Courses', how='outer')
print(df3)
# Outputs:
# Courses Fee Duration Discount
# 0 Spark 20000.0 30days 2000.0
# 1 PySpark 25000.0 40days NaN
# 2 Python 22000.0 35days 1200.0
# 3 pandas 30000.0 50days NaN
# 4 Java NaN NaN 2300.0
# 5 Go NaN NaN 2000.0
5. Complete Examples
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
}
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)
technologies2 = {
'Courses':["Spark","Java","Python","Go"],
'Discount':[2000,2300,1200,2000]
}
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)
print(df1)
print(df2)
# pandas.merge()
df3=pd.merge(df1,df2)
print(df3)
# DataFrame.merge()
df3=df1.merge(df2)
print(df3)
# Merge by column
df3=pd.merge(df1,df2, on='Courses')
print(df3)
# Merge on different colunn names
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses')
print(df3)
# Merge by Index
df3 = pd.merge(df1,df2,left_index=True,right_index=True)
print(df3)
# Use pandas.merge() to on multiple columns
df3 = pd.merge(df3, df1, how='left', left_on=['Col1','col2'], right_on = ['col1','col2'])
print(df3)
# Merge by left join
df3=pd.merge(df1,df2, on='Courses', how='left')
print(df3)
# Merge by right join
df3=pd.merge(df1,df2, on='Courses', how='right')
print(df3)
# Merge by outer join
df3=pd.merge(df1,df2, on='Courses', how='outer')
print(df3)
Merging in Pandas refers to the process of combining two or more DataFrames into a single DataFrame based on a common column or index.
You can merge two DataFrames in Pandas using the pd.merge()
function. This function takes the two DataFrames to be merged as its primary arguments and allows you to specify various options for how the merge should be performed.
Yes, you can merge on multiple columns by passing a list of column names to the on
parameter. For example, pd.merge(df1, df2, on=['col1', 'col2'])
will merge based on both ‘col1’ and ‘col2’.
If the column names are different, you can use the left_on
and right_on
parameters to specify the corresponding columns in the left and right DataFrames.
Conclusion
I hope you have learned the pandas merge() method that merges the DataFrame. Merging is a join operation that combines the columns from both DataFrames. Merge also supports joining on columns, index, and a combination of both.
Related Articles
- Pandas Merge DataFrames on Multiple Columns
- Pandas Concat Two DataFrames Explained
- Pandas Merge DataFrames on Index
- Pandas Add Header Row to DataFrame
- Pandas Add Column Names to DataFrame
- Pandas Concatenate Two Columns
- Pandas Merge Multiple DataFrames
- How to Merge Series into Pandas DataFrame