To merge DataFrames by index use pandas.merge()
, pandas.concat()
and DataFrame.join()
methods. All these methods are very similar but join()
is considered a more efficient way to join on indices. pandas.concat() method to concatenate two DataFrames by setting axis=1
. merge() is considered most efficient to combine on columns
In this article, I will explain how to merge two pandas DataFrames by index using merge()
, concat()
and join()
methods with examples.
1. Quick Examples of Pandas Merge DataFrames by Index
If you are in a hurry, below are some quick examples of how to merge two pandas DataFrames by index.
# Below are some quick examples
# Merge two DataFrames by index using pandas.merge()
df2 = pd.merge(df, df1, left_index=True, right_index=True)
# Join two DataFrames
df2 = df.join(df1)
# join two DataFrames with concat
df2 = pd.concat([df, df1], axis=1)
Now, let’s create a DataFrame with a few rows and columns, execute these examples and validate results. First DataFrame contains column names Courses
, Fee
and second DataFrame contains column names Duration
, Discount
. I will be merging these two DataFrames into a single one by combining columns from both.
import pandas as pd
df = pd.DataFrame({'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,24000]},
index=['r1','r2','r3','r4'])
df1 = pd.DataFrame({'Duration':['30day','40days','35days','60days','55days'],
'Discount':[1000,2300,2500,2000,3000]},
index=['r1','r2','r3','r5','r6'])
print(df)
print(df1)
Yields below output.
Courses Fee
r1 Spark 20000
r2 PySpark 25000
r3 Python 22000
r4 pandas 24000
Duration Discount
r1 30day 1000
r2 40days 2300
r3 35days 2500
r5 60days 2000
r6 55days 3000
3. Merge DataFrames by Index Using pandas.merge()
You can use pandas.merge()
to merge DataFrames by matching their index. When merging two DataFrames on the index, the value of left_index
and right_index
parameters of merge() function should be True
. and by default, the pd.merge() is a column-wise inner join. Let’s see with an example.
# Merge two DataFrames by index using pandas.merge()
df2 = pd.merge(df, df1, left_index=True, right_index=True)
print(df2)
Yields below output.
Courses Fee Duration Discount
r1 Spark 20000 30day 1000
r2 PySpark 25000 40days 2300
r3 Python 22000 35days 2500
This merges two DataFrames only when indexes are matching.
2. Use DataFrame.join() to Merge DataFrames by Index
DataFrame.join()
method is also used to join the two DataFrames based on indexes, and by default, the join is a column-wise left join. It always uses the right DataFrame,s index, but you can mention the key for left DataFrame. You can specify the join types for join()
function same as we mention for merge()
. You can use this syntax, DataFrame.join(DataFrame1)
.
# Join two DataFrames
df2 = df.join(df1)
print(df2)
Yields below output. Since by default it is left join, you get all rows from the left side and NaN for columns on the right side for non-matching indexes.
Courses Fee Duration Discount
r1 Spark 20000 30day 1000.0
r2 PySpark 25000 40days 2300.0
r3 Python 22000 35days 2500.0
r4 pandas 24000 NaN NaN
4. Use pandas.concat() to Merge Two DataFrames by Index
You can concatenate two DataFrames by using pandas.concat() method by setting axis=1
, and by default, pd.concat
is a row-wise outer join. For instance, you can use this syntax, pandas.concat([DataFrame,DataFrame1],axis=1)
.
# join two DataFrames with concat
df2 = pd.concat([df, df1], axis=1)
print(df2)
Yields below output. Since this is outer join by default, it returns all rows from both sides but contains Nan for columns on non-matching rows (index)
Courses Fee Duration Discount
r1 Spark 20000.0 30day 1000.0
r2 PySpark 25000.0 40days 2300.0
r3 Python 22000.0 35days 2500.0
r4 pandas 24000.0 NaN NaN
r5 NaN NaN 60days 2000.0
r6 NaN NaN 55days 3000.0
5. Complete Example For Merge DataFrames by Index
import pandas as pd
df = pd.DataFrame({'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,24000]},
index=['r1','r2','r3','r4'])
df1 = pd.DataFrame({'Duration':['30day','40days','35days','60days','55days'],
'Discount':[1000,2300,2500,2000,3000]},
index=['r1','r2','r3','r5','r6'])
print(df)
print(df1)
# Join two DataFrames
df2 = df.join(df1)
print(df2)
# Merge two DataFrames by index using pandas.merge()
df2 = pd.merge(df, df1, left_index=True, right_index=True)
print(df2)
# join two DataFrames with concat
df2 = pd.concat([df, df1], axis=1)
print(df2)
Conclusion
In this article, I have explained how to merge two pandas DataFrames by index by using Pandas.merge()
, Pandas.concat()
and DataFrame.join()
methods with examples.
Happy Learning!!
Related Articles
- Count(Distinct) SQL Equivalent in Pandas DataFrame
- Get Pandas DataFrame Columns by Data Type
- Create Test and Train Samples from Pandas DataFrame
- Pandas Convert Datetime to Date Column
- Convert Pandas DataFrame to JSON String
- Pandas Merge Multiple DataFrames
- Pandas Join Two DataFrames
- Pandas Merge DataFrames on Multiple Columns
- Pandas Merge DataFrames Explained Examples