• Post author:
  • Post category:Pandas
  • Post last modified:April 23, 2024
  • Reading time:14 mins read
You are currently viewing Pandas Combine Two DataFrames With Examples

Use pandas.concat() and DataFrame.append() to combine/merge two or multiple pandas DataFrames across rows or columns. DataFrame.append() is very useful when you want to combine two DataFrames on the row axis, meaning it creates a new Dataframe containing all rows of two DataFrames. In this article, I will explain how to combine two pandas DataFrames using functions like pandas.concat() and DataFrame.append() with examples.

Advertisements

Key Points –

  • Use the pd.concat() function to combine DataFrames vertically or horizontally based on the axis parameter.
  • Use the ignore_index parameter to reset the index of the resulting DataFrame after concatenation.
  • Understand the differences between concatenating along rows (axis=0) and columns (axis=1) and choose the appropriate method based on your data structure needs.
  • Ensure the column names and data types are compatible between the DataFrames to be combined
  • Beware of duplicate indices when combining DataFrames; use ignore_index or reset_index() to avoid unexpected behavior.

1. Quick Examples of Combine Two Pandas DataFrames

If you are in a hurry, below are some quick examples of how to combine two pandas DataFrames.


# Quick examples of combine two pandas dataframes

# Using pandas.concat() 
# To combine two DataFrame
data = [df, df1]
df2 = pd.concat(data)

# Use pandas.concat() method to ignore_index 
df2 = pd.concat([df, df1], ignore_index=True, sort=False)

# Using pandas.concat() method
data = [df, df1]
df2 = pd.concat(data, ignore_index=True, sort=False)

# Using pandas.concat() 
# To join combine two DataFrames
data = pd.concat([df, df1], axis=1, join='inner')

# Using DataFrame.append() method
data = df.append(df1)

# Use DataFrame.append() 
df2 = df.append(df1, ignore_index=True)

# Appending multiple DataFrame
data = df.append([df1, df2])

Use pandas.concat() to Combine Two DataFrames

First, let’s see pandas.concat() method to combine two DataFrames, it is used to apply for both columns or rows from one DataFrame to another. It can also perform concatenation operations along with the axis while performing set logic to the indexes.


# Create DataFrames
import pandas as pd
df = pd.DataFrame({'Courses': ["Spark","PySpark","Python","pandas"],
                    'Fee' : [20000,25000,22000,24000]})

df1 = pd.DataFrame({'Courses': ["Pandas","Hadoop","Hyperion","Java"],
                    'Fee': [25000,25200,24500,24900]})

# Using pandas.concat() to combine two DataFrames
data = [df, df1]
df2 = pd.concat(data)
print(df2)

Yields below output.


# Output:
    Courses    Fee
0     Spark  20000
1   PySpark  25000
2    Python  22000
3    pandas  24000
0    Pandas  25000
1    Hadoop  25200
2  Hyperion  24500
3      Java  24900

You can also use pandas.concat(), which is particularly helpful when you are joining more than two DataFrames. If you notice in the above example, it just added the row index as-is from two DataFrame, sometimes you may want to reset the index. You can do so by using the ignore_index=True param.


# Use pandas.concat() method to ignore_index 
df2 = pd.concat([df, df1], ignore_index=True, sort=False)
print(df2)

Yields below output.


# Output:
    Courses    Fee
0     Spark  20000
1   PySpark  25000
2    Python  22000
3    pandas  24000
4    Pandas  25000
5    Hadoop  25200
6  Hyperion  24500
7      Java  24900

Using pandas.concat() to Join Two DataFrames

As I said above pandas.concat() method is also used to join two DataFrams on columns. In order to do so use axis=1, join='inner'. By default, pd.concat() is a row-wise outer join.


import pandas as pd
df = pd.DataFrame({'Courses':["Spark","PySpark","Python","pandas"],
                      'Fee' :[20000,25000,22000,24000]})
  
df1 = pd.DataFrame({'Duration':['30day','40days','35days','60days'],
                      'Discount':[1000,2300,2500,2000,]}) 

#  Using pandas.concat() to join combine two DataFrames
df2 = pd.concat([df, df1], axis=1, join='inner')
print(df2)

Yields below output.


# Output:
   Courses    Fee Duration  Discount
0    Spark  20000    30day      1000
1  PySpark  25000   40days      2300
2   Python  22000   35days      2500
3   pandas  24000   60days      2000

Use DataFrame.append() to Combine Two DataFrames

Alternatively, you can use DataFrame.append() method to concatenate DataFrames on rows. For E.x, df.append(df1) appends df1 to the df DataFrame.


import pandas as pd
df = pd.DataFrame({'Courses': ["Spark","PySpark","Python","pandas"],
                    'Fee' : [20000,25000,22000,24000]})

df1 = pd.DataFrame({'Courses': ["Pandas","Hadoop","Hyperion","Java"],
                    'Fee': [25000,25200,24500,24900]})

# Using DataFrame.append() 
# To Combine Two DataFrames
df2 = df.append(df1)
print(df2)

Yields below output.


# Output:
    Courses    Fee
0     Spark  20000
1   PySpark  25000
2    Python  22000
3    pandas  24000
0    Pandas  25000
1    Hadoop  25200
2  Hyperion  24500
3      Java  24900

To reset the index of the combined DataFrame using the ignore_index=True parameter with the DataFrame.append() method.


# Use DataFrame.append() 
df2 = df.append(df1, ignore_index=True)
print(df2)

Yields below output.


# Output:
    Courses    Fee
0     Spark  20000
1   PySpark  25000
2    Python  22000
3    pandas  24000
4    Pandas  25000
5    Hadoop  25200
6  Hyperion  24500
7      Java  24900

Multiple Objects to Concatenate Using DataFrame.append()

Similarly. to concatenate multiple DataFrames using the DataFrame.append() method, you can pass all the DataFrames as a list to the append() method.


import pandas as pd
df = pd.DataFrame({'Courses': ["Spark", "PySpark", "Python", "Pandas"],
                    'Fee' : ['20000', '25000', '22000', '24000']}) 
  
df1 = pd.DataFrame({'Courses': ["Unix", "Hadoop", "Hyperion", "Java"],
                    'Fee': ['25000', '25200', '24500', '24900']})
  
df2 = pd.DataFrame({'Duration':['30days','40days','35days','60days','55days'],
                    'Discount':[1000,2300,2500,2000,3000]})
  
# Appending multiple DataFrame
data = df.append([df1, df2])
print(data)

Yields below output.


# Output:
    Courses    Fee Duration  Discount
0     Spark  20000      NaN       NaN
1   PySpark  25000      NaN       NaN
2    Python  22000      NaN       NaN
3    Pandas  24000      NaN       NaN
0      Unix  25000      NaN       NaN
1    Hadoop  25200      NaN       NaN
2  Hyperion  24500      NaN       NaN
3      Java  24900      NaN       NaN
0       NaN    NaN   30days    1000.0
1       NaN    NaN   40days    2300.0
2       NaN    NaN   35days    2500.0
3       NaN    NaN   60days    2000.0
4       NaN    NaN   55days    3000.0

FAQ on Pandas Combine Two DataFrames

How can I combine two DataFrames in pandas?

You can combine two DataFrames in pandas using various methods such as concat(), append(), merge(), or join(), depending on your specific requirements.

What is the difference between concat() and append() for combining DataFrames?

The concat() function is more versatile and can concatenate multiple DataFrames along either axis (rows or columns), while append() is specifically designed to concatenate along rows. append() is a shorthand for concatenating along rows, where concat() allows for more flexibility.

How do I combine two DataFrames along columns?

To combine two DataFrames along columns, you can use the concat() function with axis=1. For example: pd.concat([df1, df2], axis=1).

How do I combine DataFrames with different indices?

You can combine DataFrames with different indices using concat() or append(). Set ignore_index=True to reset the index of the resulting DataFrame.

What does ignore_index=True do when combining DataFrames?

When you set ignore_index=True while combining DataFrames, it creates a new index for the resulting DataFrame, ignoring the existing indices of the original DataFrames. This ensures that the index is reset and is continuous.

Conclusion

In this article, I have explained how to append two pandas DataFrames using DataFrame.append() and pandas.concat() methods with examples.

Happy Learning !!

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium