• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:8 mins read
You are currently viewing Pandas Merge DataFrames on Index

To merge Pandas DataFrames by index use pandas.merge(), pandas.concat(), and DataFrame.join() methods. All these methods are very similar but join() is considered a more efficient way to join indices. pandas.concat() method to concatenate two DataFrames by setting axis=1. merge() is considered the most efficient to combine DataFrames on multiple columns.

In this article, I will explain how to merge two pandas DataFrames by index using merge(), concat(), and join() methods with examples.

1. Quick Examples of Pandas Merging DataFrames by Index

If you are in a hurry, below are some quick examples of merging two pandas DataFrames by index.


# Below are some quick examples of merging DataFrames by index

# Merge two DataFrames by index using pandas.merge()
df2 = pd.merge(df, df1, left_index=True, right_index=True)

# Join two DataFrames
df2 = df.join(df1)

# Join two DataFrames with concat
df2 = pd.concat([df, df1], axis=1)

Now, let’s create a DataFrame with a few rows and columns, execute these examples, and validate the results. The first DataFrame contains column names Courses, Fee and the second DataFrame contains column names Duration, Discount. I will be merging these two DataFrames into a single one by combining columns from both.


import pandas as pd
df = pd.DataFrame({'Courses':["Spark","PySpark","Python","pandas"],
                      'Fee' :[20000,25000,22000,24000]},
                     index=['r1','r2','r3','r4'])
  
df1 = pd.DataFrame({'Duration':['30day','40days','35days','60days','55days'],
                      'Discount':[1000,2300,2500,2000,3000]}, 
                     index=['r1','r2','r3','r5','r6'])
 
print("First DataFrame:\n", df)
print("Second DataFrame:\n", df1)

Yields below output.

pandas merge DataFrames index

3. Merge DataFrames on Index Using pandas.merge()

You can use pandas.merge() function to merge DataFrames by matching their index. When merging two DataFrames on the index, the value of left_index and right_index parameters of merge() function should be True. By default, the pd.merge() is a column-wise inner join. Let’s see with an example.


# Merge two DataFrames on index using pandas.merge()
df2 = pd.merge(df, df1, left_index=True, right_index=True)
print("After merging the DataFrames:"df2)

Yields below output.

merge() method

This merges two DataFrames only when indexes are matching.

2. Pandas Join DataFrames on Index

Alternatively, you can use DataFrame.join() method to join the two DataFrames based on indexes, and by default, the join is a column-wise left join. It always uses the right DataFrame index, but you can mention the key for the left DataFrame. You can specify the join types for join() function same as we mentioned for merge(). You can use this syntax, DataFrame.join(DataFrame1).


# Join two DataFrames
df2 = df.join(df1)
print("After merging the DataFrames:"df2)

Yields below output. Since by default it is left join, you get all rows from the left side and NaN for columns on the right side for non-matching indexes.


# Output:
After merging the DataFrames:
    Courses    Fee Duration  Discount
r1    Spark  20000    30day    1000.0
r2  PySpark  25000   40days    2300.0
r3   Python  22000   35days    2500.0
r4   pandas  24000      NaN       NaN

4. Use pandas.concat() to Merge Two DataFrames by Index

You can concatenate two DataFrames by using pandas.concat() method by setting axis=1, and by default, pd.concat is a row-wise outer join. For instance, you can use this syntax, pandas.concat([DataFrame,DataFrame1],axis=1).


# Join two DataFrames with concat
df2 = pd.concat([df, df1], axis=1)
print("After merging the DataFrames:"df2)

Yields below output. Since this is outer join by default, it returns all rows from both sides but contains Nan for columns on non-matching rows (index)


# Output:
    Courses      Fee Duration  Discount
r1    Spark  20000.0    30day    1000.0
r2  PySpark  25000.0   40days    2300.0
r3   Python  22000.0   35days    2500.0
r4   pandas  24000.0      NaN       NaN
r5      NaN      NaN   60days    2000.0
r6      NaN      NaN   55days    3000.0

5. Complete Example For Merge DataFrames by Index


import pandas as pd
df = pd.DataFrame({'Courses':["Spark","PySpark","Python","pandas"],
                      'Fee' :[20000,25000,22000,24000]},
                     index=['r1','r2','r3','r4'])
  
df1 = pd.DataFrame({'Duration':['30day','40days','35days','60days','55days'],
                      'Discount':[1000,2300,2500,2000,3000]}, 
                     index=['r1','r2','r3','r5','r6'])
 
print(df)
print(df1)

# Join two DataFrames
df2 = df.join(df1)
print(df2)

# Merge two DataFrames by index using pandas.merge()
df2 = pd.merge(df, df1, left_index=True, right_index=True)
print(df2)

# Join two DataFrames with concat
df2 = pd.concat([df, df1], axis=1)
print(df2)

Conclusion

In this article, I have explained how to merge two pandas DataFrames by index by using Pandas.merge(), Pandas.concat() and DataFrame.join() methods with examples.

Happy Learning!!

References

Malli

Malli is an experienced technical writer with a passion for translating complex Python concepts into clear, concise, and user-friendly articles. Over the years, he has written hundreds of articles in Pandas, NumPy, Python, and takes pride in ability to bridge the gap between technical experts and end-users.