• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:8 mins read
You are currently viewing Pandas Combine Two DataFrames With Examples

Use pandas.concat() and DataFrame.append() to combine/merge two or multiple pandas DataFrames across rows or columns. DataFrame.append() is very useful when you want to combine two DataFrames on the row axis, meaning it creates a new Dataframe containing all rows of two DataFrames. In this article, I will explain how to combine two pandas DataFrames using functions like pandas.concat() and DataFrame.append() with examples.

1. Quick Examples of Combine Two pandas DataFrames

If you are in a hurry, below are some quick examples of how to combine two pandas DataFrames.


# Below are some quick examples

# Using pandas.concat() to combine two DataFrame
data = [df, df1]
df2 = pd.concat(data)

# Use pandas.concat() method to ignore_index 
df2 = pd.concat([df, df1], ignore_index=True, sort=False)

# Using pandas.concat() Method
data = [df, df1]
df2 = pd.concat(data, ignore_index=True, sort=False)

# Using pandas.concat() to join combine two DataFrames
data = pd.concat([df, df1], axis=1, join='inner')

# Using DataFrame.append() method
data = df.append(df1)

# Use DataFrame.append() 
df2 = df.append(df1, ignore_index=True)

# Appending multiple DataFrame
data = df.append([df1, df2])

2. Use pandas.concat() to Combine Two DataFrames

First, let’s see pandas.concat() method to combine two DataFrames, it is used to apply for both columns or rows from one DataFrame to another. It can also perform concatenation operations along with the axis while performing set logic to the indexes.


# Create DataFrames
import pandas as pd
df = pd.DataFrame({'Courses': ["Spark","PySpark","Python","pandas"],
                    'Fee' : [20000,25000,22000,24000]})

df1 = pd.DataFrame({'Courses': ["Pandas","Hadoop","Hyperion","Java"],
                    'Fee': [25000,25200,24500,24900]})

# Using pandas.concat() to combine two DataFrames
data = [df, df1]
df2 = pd.concat(data)
print(df2)

Yields below output.


# Output:
    Courses    Fee
0     Spark  20000
1   PySpark  25000
2    Python  22000
3    pandas  24000
0    Pandas  25000
1    Hadoop  25200
2  Hyperion  24500
3      Java  24900

You can also use pandas.concat(), which is particularly helpful when you are joining more than two DataFrames. If you notice in the above example, it just added the row index as-is from two DataFrame, sometimes you may want to reset the index. You can do so by using the ignore_index=True param.


# Use pandas.concat() method to ignore_index 
df2 = pd.concat([df, df1], ignore_index=True, sort=False)
print(df2)

Yields below output.


# Output:
    Courses    Fee
0     Spark  20000
1   PySpark  25000
2    Python  22000
3    pandas  24000
4    Pandas  25000
5    Hadoop  25200
6  Hyperion  24500
7      Java  24900

3. Using pandas.concat() to Join Two DataFrames

As I said above pandas.concat() method is also used to join two DataFrams on columns. In order to do so use axis=1, join='inner'. By default, pd.concat() is a row-wise outer join.


import pandas as pd
df = pd.DataFrame({'Courses':["Spark","PySpark","Python","pandas"],
                      'Fee' :[20000,25000,22000,24000]})
  
df1 = pd.DataFrame({'Duration':['30day','40days','35days','60days'],
                      'Discount':[1000,2300,2500,2000,]}) 

#  Using pandas.concat() to join combine two DataFrames
df2 = pd.concat([df, df1], axis=1, join='inner')
print(df2)

Yields below output.


# Output:
   Courses    Fee Duration  Discount
0    Spark  20000    30day      1000
1  PySpark  25000   40days      2300
2   Python  22000   35days      2500
3   pandas  24000   60days      2000

4. Use DataFrame.append() to Combine Two DataFrames

Alternatively, you can use DataFrame.append() method to concatenate DataFrames on rows. For E.x, df.append(df1) appends df1 to the df DataFrame.


import pandas as pd
df = pd.DataFrame({'Courses': ["Spark","PySpark","Python","pandas"],
                    'Fee' : [20000,25000,22000,24000]})

df1 = pd.DataFrame({'Courses': ["Pandas","Hadoop","Hyperion","Java"],
                    'Fee': [25000,25200,24500,24900]})

# Using DataFrame.append() to Combine Two DataFrames
df2 = df.append(df1)
print(df2)

Yields below output.


# Output:
    Courses    Fee
0     Spark  20000
1   PySpark  25000
2    Python  22000
3    pandas  24000
0    Pandas  25000
1    Hadoop  25200
2  Hyperion  24500
3      Java  24900

Use ignore_index=True param to reset the index on combined DataFrame.


# Use DataFrame.append() 
df2 = df.append(df1, ignore_index=True)
print(df2)

Yields below output.


# Output:
    Courses    Fee
0     Spark  20000
1   PySpark  25000
2    Python  22000
3    pandas  24000
4    Pandas  25000
5    Hadoop  25200
6  Hyperion  24500
7      Java  24900

5. Multiple Objects to Concatenate Using DataFrame.append()

You can also use DataFrame.append() method to concatenate multiple DataFrames.


import pandas as pd
df = pd.DataFrame({'Courses': ["Spark", "PySpark", "Python", "Pandas"],
                    'Fee' : ['20000', '25000', '22000', '24000']}) 
  
df1 = pd.DataFrame({'Courses': ["Unix", "Hadoop", "Hyperion", "Java"],
                    'Fee': ['25000', '25200', '24500', '24900']})
  
df2 = pd.DataFrame({'Duration':['30day','40days','35days','60days','55days'],
                    'Discount':[1000,2300,2500,2000,3000]})
  
# Appending multiple DataFrame
data = df.append([df1, df2])
print(data)

Yields below output.


# Output:
    Courses    Fee Duration  Discount
0     Spark  20000      NaN       NaN
1   PySpark  25000      NaN       NaN
2    Python  22000      NaN       NaN
3    Pandas  24000      NaN       NaN
0      Unix  25000      NaN       NaN
1    Hadoop  25200      NaN       NaN
2  Hyperion  24500      NaN       NaN
3      Java  24900      NaN       NaN
0       NaN    NaN    30day    1000.0
1       NaN    NaN   40days    2300.0
2       NaN    NaN   35days    2500.0
3       NaN    NaN   60days    2000.0
4       NaN    NaN   55days    3000.0

Conclusion

In this article, I have explained how to append two pandas DataFrames using DataFrame.append() and pandas.concat() methods with examples.

Happy Learning !!

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium