Pandas support several methods to merge two DataFrames similar to SQL joins to combine columns. In this article, I will explain how to join two DataFrames using merge()
, join()
, and concat()
methods. Each of these methods provides different ways to merge DataFrames. merge() is the most used approach to join two DataFrames by index and by columns. join() is primarily used to combine on index and concat() is used to append DataFrame rows but it can also be used to join.
Related: In Pandas, you can also merge multiple DataFrames.
1. Quick Examples of Pandas Merging Two DataFrames
Below are some quick examples of pandas merging two DataFrames.
# Below are the examples of merging two DataFrames
# Example 1: Pandas Merge two DataFrames using join()
df3 = df1.join(df2, lsuffix="_left", rsuffix="_right")
# Example 2: Using pd.merge()
df3 = pd.merge(df1,df2)
# Example 3: Using DataFrame.merge()
df3 = df1.merge(df2)
# Example 4: Merge DataFrames by Columns
df3 = pd.merge(df1,df2, on='Courses')
# Example 5: Merge DataFrames by different Columns
df3 = pd.merge(df1,df2, left_on='Courses', right_on='Courses')
# Example 6: Merge DataFrames on index
df3 = pd.merge(df1, df2, left_index=True, right_index=True)
# Example 7: By using concat()
df3 = pd.concat([df1,df2],axis=1,join='inner')
First, let’s create a DataFrame that I can use to demonstrate with the examples.
# Create DataFrames
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
}
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)
technologies2 = {
'Courses':["Spark","Java","Python","Go"],
'Discount':[2000,2300,1200,2000]
}
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)
print("First DataFrame:\n", df1)
print("Second DataFrame:\n", df2)
Yields below output.
2. Pandas Merge Two DataFrames using join()
pd.df.join()
method can be used to combine two DataFrames on row indices. This by default does the left join and provides a way to specify the different join types. It supports left
, inner
, right
, and outer
join types. It also supports different params, refer to pandas join() for syntax, usage, and more examples of the join() method.
By default, it uses the left join on the row index. This is unlike merge() where it does inner join on common columns.
# Pandas merge two DataFrames
df3 = df1.join(df2, lsuffix="_left", rsuffix="_right")
print("After merging the two DataFrames:\n", df3)
Yields below output.
3. Pandas Merge Two DataFrames using merge()
In this section, I will explain how to merge pandas DataFrames using the merge() method. This method is the most efficient way to merge DataFrames on columns. It also supports merging on the index but an efficient way would be to use join().
Using merge() you can do merging by columns, merging by index, merging on multiple columns, and different join types. By default, it merges on all common columns that exist on both DataFrames and performs an inner join.
# Pandas.merge()
df3 = pd.merge(df1,df2)
print("After merging the two DataFrames:\n", df3)
# DataFrame.merge()
df3 = df1.merge(df2)
print("After merging the two DataFrames:\n", df3)
Yields below output.
# Output:
After merging the two DataFrames
Courses Fee Duration Discount
0 Spark 20000 30days 2000
1 Python 22000 35days 1200
4. Pandas Merge Two DataFrames Based on Columns
To merge two DataFrames based on column you can explicitly specify the column name you want to use for joining. To use column names use on param of the merge() method. This also takes a list of names when you want to merge on multiple columns.
# Merge DataFrames by Columns
df3=pd.merge(df1,df2, on='Courses')
print(df3)
In case, if you want to combine column names that are different on two pandas DataFrames.
# Merge DataFrames by different columns
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses')
print("After merging the DataFrames:\n", df3)
merge() also supports different params, refer to pandas merge() to learn syntax, usage with examples.
5. Pandas Merge Two DataFrames on Index
You can use pandas.merge()
function to merge two DataFrames by matching their index. When merging two DataFrames on the index, the value of left_index
and right_index
parameters of merge() function should be True
. By default, the pd.merge() is a column-wise inner join. Let’s see with an example.
# merge two DataFrames on Index
df3 = pd.merge(df1, df2, left_index=True, right_index=True)
print("After merging the DataFrames:"df3)
Yields below output.
# Output:
After merging the DataFrames:
Courses_x Fee Duration Courses_y Discount
r1 Spark 20000 30days Spark 2000
r3 Python 22000 35days Python 1200
6. pandas concat() Two DataFrames
pandas concat() method is the least used to merge two DataFrames. It is mainly used to append DataFrames Rows. however, it can also be used to merge pandas DataFrames.
# By using concat()
df3=pd.concat([df1,df2],axis=1,join='inner')
print(df3)
Yields below output.
# Output:
Courses Fee Duration Discount
0 Spark 20000 30days 2000
1 Python 22000 35days 1200
5. Complete Examples
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
}
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)
technologies2 = {
'Courses':["Spark","Java","Python","Go"],
'Discount':[2000,2300,1200,2000]
}
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)
print(df1)
print(df2)
# Pandas join two DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right")
print(df3)
# Pandas.merge()
df3=pd.merge(df1,df2)
# DataFrame.merge()
df3=df1.merge(df2)
# Merge DataFrames by Columns
df3=pd.merge(df1,df2, on='Courses')
# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses')
print(df3)
# By using concat()
df3=pd.concat([df1,df2],axis=1,join='inner')
print(df3)
Merging two DataFrames in Pandas is used to combine data from multiple sources into a single DataFrame based on common columns or indexes. It allows you to perform operations like joining, merging, and concatenating data, which is essential for data analysis and manipulation.
You can merge two DataFrames in Pandas using the merge()
function. This function allows you to specify the DataFrames to merge, the columns or keys on which to merge, and the type of merge operation (e.g., inner, outer, left, right).
Yes, you can merge DataFrames on multiple columns by passing a list of column names to the on
parameter of the merge()
function.
merge()
function and the concat()
function in Pandas? The merge()
function in Pandas is used to combine DataFrames based on common columns or keys, while the concat()
function is used to concatenate DataFrames along a particular axis (either rows or columns). Merging involves combining data with a shared key, while concatenation simply stacks data together.
Conclusion
In this article, you have learned to merge two DataFrames using join(), merge(), and concat() methods with explanations and examples.
Related Articles
- Pandas Merge Multiple DataFrames
- Pandas Merge DataFrames on Multiple Columns
- Pandas Merge DataFrames Explained Examples
- Pandas Merge DataFrames on Index
- Differences between Pandas Join vs Merge
- Pandas – How to Merge Series into DataFrame
- How to Append Pandas Series?
- Check Values of Pandas Series is Unique