pandas support several methods to merge two DataFrames similar to SQL joins to combine columns. In this article, I will explain how to join two DataFrames using merge()
, join()
, and concat()
methods. Each of these methods provides different ways to merge DataFrames. merge() is the most used approach to join two DataFrames by columns and index. join() is primarily used to combine on index and concat() is used to append DataFrame rows but it can also be used to join.
1. Quick Examples of Pandas Merge Two DataFrames
Below are some quick examples of pandas merging two DataFrames.
# pandas join two DataFrames by Index
df3=df1.join(df2, lsuffix="_left", rsuffix="_right")
# pandas.merge() by Columns
df3=pd.merge(df1,df2)
# DataFrame.merge() by Columns
df3=df1.merge(df2)
# Merge DataFrames by Columns
df3=pd.merge(df1,df2, on='Courses')
# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses')
# By using concat()
df3=pd.concat([df1,df2],axis=1,join='inner')
First, let’s create a DataFrame that I can use to demonstrate with examples
# Create DataFrames
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
}
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)
technologies2 = {
'Courses':["Spark","Java","Python","Go"],
'Discount':[2000,2300,1200,2000]
}
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)
print(df1)
print(df2)
Yields below output
2. Pandas join() Two DataFrames
panads.DataFrame.join() method can be used to combine two DataFrames on row indices. This by default does the left join and provides a way to specify the different join types. It supports left
, inner
, right
, and outer
join types. It also supports different params, refer to pandas join() for syntax, usage, and more examples of join() method.
By default, it uses left join on the row index. This is unlike merge() where it does inner join on common columns.
# pandas join two DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right")
print(df3)
Yields below output.
Courses_left Fee Duration Courses_right Discount
r1 Spark 20000 30days Spark 2000.0
r2 PySpark 25000 40days NaN NaN
r3 Python 22000 35days Python 1200.0
r4 pandas 30000 50days NaN NaN
3. Pandas merge() Two DataFrames
In this section, I will explain how to merge pandas DataFrames using merge() method. This method is the most efficient way to merge DataFrames on columns. It also supports merging on the index but an efficient way would be to use join().
Using merge() you can do merging by columns, merging by index, merging on multiple columns, and different join types. By default, it merges on all common columns that exist on both DataFrames and performs an inner join.
# pandas.merge()
df3=pd.merge(df1,df2)
# DataFrame.merge()
df3=df1.merge(df2)
Yields below output.
Courses Fee Duration Discount
0 Spark 20000 30days 2000
1 Python 22000 35days 1200
You can also specify the column names explicitly.
# Merge DataFrames by Columns
df3=pd.merge(df1,df2, on='Courses')
In case if you wanted to combine column names that are different on two pandas DataFrames.
# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses')
print(df3)
merge() also supports different params, refer to pandas merge() to learn syntax, usage with examples.
4. pandas concat() Two DataFrames
pandas concat() method is the least used to merge two DataFrames. It is mainly used to append DataFrames Rows. however, it can also be used to merge pandas DataFrames.
# By using concat()
df3=pd.concat([df1,df2],axis=1,join='inner')
print(df3)
Yields below output.
Courses Fee Duration Discount
0 Spark 20000 30days 2000
1 Python 22000 35days 1200
5. Complete Examples
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
}
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)
technologies2 = {
'Courses':["Spark","Java","Python","Go"],
'Discount':[2000,2300,1200,2000]
}
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)
print(df1)
print(df2)
# pandas join two DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right")
print(df3)
# pandas.merge()
df3=pd.merge(df1,df2)
# DataFrame.merge()
df3=df1.merge(df2)
# Merge DataFrames by Columns
df3=pd.merge(df1,df2, on='Courses')
# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses')
print(df3)
# By using concat()
df3=pd.concat([df1,df2],axis=1,join='inner')
print(df3)
Conclusion
In this article, you have learned merging two DataFrames using join(), merge(), and concat() methods with explanation and examples.
Related Articles
- Pandas Merge Multiple DataFrames
- Pandas Merge DataFrames on Multiple Columns
- Pandas Merge DataFrames Explained Examples
- Pandas Merge DataFrames on Index
- Differences between Pandas Join vs Merge
- Pandas – How to Merge Series into DataFrame
- How to Append Pandas Series?
- Check Values of Pandas Series is Unique