To merge Pandas DataFrames by index use pandas.merge()
, pandas.concat()
, and DataFrame.join()
methods. All these methods are very similar but join()
is considered a more efficient way to join indices. pandas.concat() method to concatenate two DataFrames by setting axis=1
. merge() is considered the most efficient to combine DataFrames on multiple columns.
In this article, I will explain how to merge two pandas DataFrames by index using merge()
, concat()
, and join()
methods with examples.
Key Points –
- You can merge DataFrames based on their index using
left_index=True
andright_index=True
in themerge()
function. DataFrame.join()
by default joins on the index of the two DataFrames, making it simpler thanmerge()
for this specific case.- Use
pd.concat()
withaxis=1
to concatenate DataFrames horizontally (column-wise) based on their index. - The join types (
inner
,outer
,left
,right
) can be controlled in bothmerge()
andjoin()
to determine how the indices from both DataFrames are handled. - The default join type in
merge()
isinner
, while injoin()
andconcat()
, it isouter
. - When merging on indices, Pandas aligns the indices of the two DataFrames. Rows with unmatched indices are excluded in
inner
joins or results in NaNs inouter
joins.
Quick Examples of Pandas Merging DataFrames by Index
If you are in a hurry, below are some quick examples of merging two pandas DataFrames by index.
# Quick examples of pandas merging DataFrames by index
# Merge two DataFrames by index
# Using pandas.merge()
df2 = pd.merge(df, df1, left_index=True, right_index=True)
# Join two Dataframes
df2 = df.join(df1)
# Join two DataFrames with concat
df2 = pd.concat([df, df1], axis=1)
Now, let’s create a DataFrame with a few rows and columns, execute these examples, and validate the results. The first DataFrame contains column names Courses
, Fee
and the second DataFrame contains column names Duration
, Discount
. I will be merging these two DataFrames into a single one by combining columns from both.
import pandas as pd
df = pd.DataFrame({'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,24000]},
index=['r1','r2','r3','r4'])
df1 = pd.DataFrame({'Duration':['30day','40days','35days','60days','55days'],
'Discount':[1000,2300,2500,2000,3000]},
index=['r1','r2','r3','r5','r6'])
print("First DataFrame:\n", df)
print("Second DataFrame:\n", df1)
Yields below output.
Merge DataFrames on Index Using pandas.merge()
You can use pandas.merge()
function to merge DataFrames by matching their index. When merging two DataFrames on the index, the value of left_index
and right_index
parameters of merge()
function should be True
. By default, the pd.merge()
is a column-wise inner join. Let’s see with an example.
# Merge two DataFrames on index using pandas.merge()
df2 = pd.merge(df, df1, left_index=True, right_index=True)
print("After merging the DataFrames:"df2)
Yields below output.
This merges two DataFrames only when indexes are matching.
Pandas Join DataFrames on Index
Alternatively, you can use DataFrame.join()
method to join the two DataFrames based on indexes, and by default, the join is a column-wise left join. It always uses the right DataFrame index, but you can mention the key for the left DataFrame. You can specify the join types for join() function same as we mentioned for merge()
. You can use this syntax, DataFrame.join(DataFrame1)
.
# Join two DataFrames
df2 = df.join(df1)
print("After merging the DataFrames:"df2)
Yields below output. Since by default it is left join, you get all rows from the left side and NaN for columns on the right side for non-matching indexes.
# Output:
After merging the DataFrames:
Courses Fee Duration Discount
r1 Spark 20000 30day 1000.0
r2 PySpark 25000 40days 2300.0
r3 Python 22000 35days 2500.0
r4 pandas 24000 NaN NaN
Use pandas.concat() to Merge Two DataFrames by Index
You can concatenate two DataFrames by using pandas.concat() method by setting axis=1
, and by default, pd.concat
is a row-wise outer join. For instance, you can use this syntax, pandas.concat([DataFrame,DataFrame1],axis=1)
.
# Join two DataFrames with concat
df2 = pd.concat([df, df1], axis=1)
print("After merging the DataFrames:"df2)
Yields below output. Since this is outer join by default, it returns all rows from both sides but contains Nan for columns on non-matching rows (index)
# Output:
Courses Fee Duration Discount
r1 Spark 20000.0 30day 1000.0
r2 PySpark 25000.0 40days 2300.0
r3 Python 22000.0 35days 2500.0
r4 pandas 24000.0 NaN NaN
r5 NaN NaN 60days 2000.0
r6 NaN NaN 55days 3000.0
Complete Example For Merge DataFrames by Index
import pandas as pd
df = pd.DataFrame({'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,24000]},
index=['r1','r2','r3','r4'])
df1 = pd.DataFrame({'Duration':['30day','40days','35days','60days','55days'],
'Discount':[1000,2300,2500,2000,3000]},
index=['r1','r2','r3','r5','r6'])
print(df)
print(df1)
# Join two DataFrames
df2 = df.join(df1)
print(df2)
# Merge two DataFrames by index
# Using pandas.merge()
df2 = pd.merge(df, df1, left_index=True, right_index=True)
print(df2)
# Join two DataFrames with concat
df2 = pd.concat([df, df1], axis=1)
print(df2)
Frequently Asked Questions on Pandas Merge DataFrames on Index
Merging on the index involves combining two or more DataFrames using their row labels (index) as the key for the operation. Instead of using columns for the join, the index acts as the matching reference.
You can use the pd.merge()
function or the join()
method. Both allow merging on the index.
If the DataFrames have duplicate indices, the resulting merged DataFrame will have all possible combinations of the duplicate rows.
To merge on both the index and columns in Pandas, you can use the pd.merge()
function with a combination of left_index=True
, right_index=True
, and the on
parameter for columns.
You can merge more than two DataFrames on their index using the pd.concat()
function. This function allows you to concatenate multiple DataFrames along a specified axis (rows or columns), using the index as the default key for alignment.
Conclusion
In this article, I have explained how to merge two pandas DataFrames by index by using Pandas.merge()
, Pandas.concat()
and DataFrame.join()
methods with examples.
Happy Learning!!
Related Articles
- Pandas Join Two DataFrames
- Pandas Merge Multiple DataFrames
- Pandas Convert Datetime to Date Column
- Convert Pandas DataFrame to JSON String
- Pandas Merge DataFrames on Multiple Columns
- Pandas Merge DataFrames Explained Examples
- Pandas Merge Suffixes Explained Examples
- Get Pandas DataFrame Columns by Data Type
- Pandas Merge Indicator Explained With Examples
- Count(Distinct) SQL Equivalent in Pandas DataFrame