To join two DataFrames in pandas, you can use several methods depending on how you want to combine them. The most common methods are merge()
, join()
, and concatenation with concat()
.
In this article, I will explain how to join two DataFrames using merge()
, join()
, and concat()
methods. Each of these methods provides different ways to join DataFrames. merge()
is the most used approach to join two DataFrames by columns and index. join()
is primarily used to combine on index and concat()
is used to append DataFrame rows but it can also be used to join.
Key Points –
- Use the
pd.merge()
function in Pandas to join two DataFrames based on one or more common columns. - Specify the columns to join on using the
on
parameter, or useleft_on
andright_on
parameters if the column names are different in the two DataFrames. - Control the type of join (inner, outer, left, or right) using the
how
parameter. - Choose the appropriate join type (e.g., inner, outer, left, right) based on the desired outcome and the nature of the data being merged.
- Handle duplicate column names resulting from the join using the
suffixes
parameter to distinguish them.
Quick Examples of Join Two DataFrames
Following are quick examples of pandas joining two DataFrames.
# Quick examples of join two dataframes
# Pandas join two DataFrames by index
df3=df1.join(df2, lsuffix="_left", rsuffix="_right")
# Pandas.merge() by column
df3=pd.merge(df1,df2)
# DataFrame.merge() by column
df3=df1.merge(df2)
# Merge DataFrames by column
df3=pd.merge(df1,df2, on='Courses')
# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses')
# By using concat()
df3=pd.concat([df1,df2],axis=1,join='inner')
Now, let’s create a pandas DataFrame.
# Create DataFrames
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
}
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)
technologies2 = {
'Courses':["Spark","Java","Python","Go"],
'Discount':[2000,2300,1200,2000]
}
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)
print(df1)
print(df2)
Yields below output
Pandas join() Two DataFrames
The Pandas DataFrame join() method can be used to combine two DataFrames on row indices. By default, it performs a left join, offering options to specify various join types like left
, inner
, right
, and outer joins. Additionally, the method supports diverse parameters. For further syntax, usage, and additional examples, consult the documentation on pandas join()
.
By default, join()
uses a left join on the row index. This contrasts with merge()
, which performs an inner join on common columns by default.
# Pandas join two DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right")
print(df3)
Yields below output.
# Output:
Courses_left Fee Duration Courses_right Discount
r1 Spark 20000 30days Spark 2000.0
r2 PySpark 25000 40days NaN NaN
r3 Python 22000 35days Python 1200.0
r4 pandas 30000 50days NaN NaN
Pandas merge() Two DataFrames
In this section, I will explain the usage of pandas DataFrames using merge() method. This method is the most efficient way to join DataFrames on columns. It also supports joining on the index but an efficient way would be to use join()
.
Using merge()
you can merge by columns, by index, you can perform merges based on indices, multiple columns, and various types of joins. By default, it merges on all shared columns between the DataFrames, conducting an inner join.
# Pandas.merge()
df3=pd.merge(df1,df2)
# DataFrame.merge()
df3=df1.merge(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
0 Spark 20000 30days 2000
1 Python 22000 35days 1200
You can alternatively explicitly specify the column names.
# Merge dataframes by columns
df3=pd.merge(df1,df2, on='Courses')
If you wanted to merge columns with different names across two pandas DataFrames.
# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses')
print(df3)
merge()
function also accommodates various parameters; consult the pandas merge()
for syntax, usage, and examples.
Pandas concat() Two DataFrames
pandas concat() method is the least used to join two DataFrames. It is mainly used to append DataFrames Rows. however, it can also be used to join pandas DataFrames.
# By using concat()
df3=pd.concat([df1,df2],axis=1,join='inner')
print(df3)
Yields below output.
# Output:
Courses Fee Duration Discount
0 Spark 20000 30days 2000
1 Python 22000 35days 1200
Complete Examples of Pandas Join two DataFrames
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
}
index_labels=['r1','r2','r3','r4']
df1 = pd.DataFrame(technologies,index=index_labels)
technologies2 = {
'Courses':["Spark","Java","Python","Go"],
'Discount':[2000,2300,1200,2000]
}
index_labels2=['r1','r6','r3','r5']
df2 = pd.DataFrame(technologies2,index=index_labels2)
print(df1)
print(df2)
# Pandas join two DataFrames
df3=df1.join(df2, lsuffix="_left", rsuffix="_right")
print(df3)
# Pandas.merge()
df3=pd.merge(df1,df2)
# DataFrame.merge()
df3=df1.merge(df2)
# Merge DataFrames by Columns
df3=pd.merge(df1,df2, on='Courses')
# When column names are different
df3=pd.merge(df1,df2, left_on='Courses', right_on='Courses')
print(df3)
# By using concat()
df3=pd.concat([df1,df2],axis=1,join='inner')
print(df3)
FAQs on Join Pandas Two DataFrames
You can use pd.merge()
, df.join()
, or df.concat()
functions in Pandas to join two DataFrames. The choice of method depends upon the type of join you want to perform.
In Pandas, merging and joining essentially perform the same operation of combining two DataFrames based on common columns. However, “merge” is the preferred term and function (pd.merge()
), while “join” typically refers to specific types of merges, such as SQL-style joins (df.join()
).
If the DataFrames have different column names but represent the same data, you can use the left_on
right_on
parameters in the merge()
function to specify the columns to join on explicitly.
Conclusion
In conclusion, this article has provided insights into merging two pandas DataFrames through join()
, merge()
, and concat()
methods, accompanied by practical examples.
Related Articles
- Pandas Join Explained With Examples
- Pandas Percentage Total With Groupby
- Pandas Merge Multiple DataFrames
- Pandas Merge Two DataFrames
- Pandas Get Total / Sum of Columns
- How to Change Column Name in Pandas
- Pandas Normalize Columns of DataFrame
- Pandas Merge Suffixes Explained Examples
- How to Print Pandas DataFrame without Index
- Pandas Merge Indicator Explained With Examples
- How to Union Pandas DataFrames using Concat?