Use pandas.concat()
and DataFrame.append()
to combine two or multiple pandas DataFrames across rows or columns. DataFrame.append()
is a convenient method for merging two DataFrames along the row axis. It effectively creates a new DataFrame by stacking all rows from both DataFrames vertically.
In this article, I will explain how to combine two pandas DataFrames using functions like pandas.concat()
and DataFrame.append()
with examples.
Key Points –
- Use the
pd.concat()
function to combine DataFrames vertically or horizontally based on the axis parameter. - Use the
ignore_index
parameter to reset the index of the resulting DataFrame after concatenation. - Understand the differences between concatenating along rows (
axis=0
) and columns (axis=1
) and choose the appropriate method based on your data structure needs. - Ensure the column names and data types are compatible between the DataFrames to be combined
- Beware of duplicate indices when combining DataFrames; use
ignore_index
orreset_index()
to avoid unexpected behavior.
Quick Examples of Combine Two Pandas
If you are in a hurry, below are some quick examples of combining two pandas DataFrames.
# Quick examples of combine two pandas dataframes
# Using pandas.concat()
# To combine two DataFrame
data = [df, df1]
df2 = pd.concat(data)
# Use pandas.concat() method to ignore_index
df2 = pd.concat([df, df1], ignore_index=True, sort=False)
# Using pandas.concat() method
data = [df, df1]
df2 = pd.concat(data, ignore_index=True, sort=False)
# Using pandas.concat()
# To join combine two DataFrames
data = pd.concat([df, df1], axis=1, join='inner')
# Using DataFrame.append() method
data = df.append(df1)
# Use DataFrame.append()
df2 = df.append(df1, ignore_index=True)
# Appending multiple DataFrame
data = df.append([df1, df2])
Use pandas.concat() to Combine Two DataFrames
First, let’s see concat()
function to combine two DataFrames, it is used to apply for both columns or rows from one DataFrame to another. It can also perform concatenation operations along with the axis while performing set logic to the indexes.
# Create DataFrames
import pandas as pd
df = pd.DataFrame({'Courses': ["Spark","PySpark","Python","pandas"],
'Fee' : [20000,25000,22000,24000]})
df1 = pd.DataFrame({'Courses': ["Pandas","Hadoop","Hyperion","Java"],
'Fee': [25000,25200,24500,24900]})
# Using pandas.concat()
# To combine two dataframes
data = [df, df1]
df2 = pd.concat(data)
print(df2)
# Output:
# Courses Fee
# 0 Spark 20000
# 1 PySpark 25000
# 2 Python 22000
# 3 pandas 24000
# 0 Pandas 25000
# 1 Hadoop 25200
# 2 Hyperion 24500
# 3 Java 24900
You can also use pandas.concat()
, which is particularly helpful when you are joining more than two DataFrames. If you notice in the above example, it just added the row index as-is from two DataFrame, sometimes you may want to reset the index. You can do so by using the ignore_index=True
param.
# Use pandas.concat() method to ignore_index
df2 = pd.concat([df, df1], ignore_index=True, sort=False)
print(df2)
# Output:
# Courses Fee
# 0 Spark 20000
# 1 PySpark 25000
# 2 Python 22000
# 3 pandas 24000
# 4 Pandas 25000
# 5 Hadoop 25200
# 6 Hyperion 24500
# 7 Java 24900
Combine Two DataFrames Using concat()
As I said above pandas.concat()
function is also used to join two DataFrams on columns. In order to do so use axis=1
, join='inner'
. By default, pd.concat()
is a row-wise outer join.
import pandas as pd
df = pd.DataFrame({'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,24000]})
df1 = pd.DataFrame({'Duration':['30day','40days','35days','60days'],
'Discount':[1000,2300,2500,2000,]})
# Using pandas.concat() to join combine two dataframes
df2 = pd.concat([df, df1], axis=1, join='inner')
print(df2)
# Output:
# Courses Fee Duration Discount
# 0 Spark 20000 30day 1000
# 1 PySpark 25000 40days 2300
# 2 Python 22000 35days 2500
# 3 pandas 24000 60days 2000
Use DataFrame.append() to Combine Two DataFrames
Alternatively, you can use DataFrame.append()
method to concatenate DataFrames on rows. For E.x, df.append(df1)
appends df1
to the df
DataFrame.
import pandas as pd
df = pd.DataFrame({'Courses': ["Spark","PySpark","Python","pandas"],
'Fee' : [20000,25000,22000,24000]})
df1 = pd.DataFrame({'Courses': ["Pandas","Hadoop","Hyperion","Java"],
'Fee': [25000,25200,24500,24900]})
# Using DataFrame.append()
# To Combine Two DataFrames
df2 = df.append(df1)
print(df2)
# Output:
# Courses Fee
# 0 Spark 20000
# 1 PySpark 25000
# 2 Python 22000
# 3 pandas 24000
# 0 Pandas 25000
# 1 Hadoop 25200
# 2 Hyperion 24500
# 3 Java 24900
To reset the index of the combined DataFrame using the ignore_index=True
parameter with the DataFrame.append()
method.
# Use DataFrame.append()
df2 = df.append(df1, ignore_index=True)
print(df2)
# Output:
# Courses Fee
# 0 Spark 20000
# 1 PySpark 25000
# 2 Python 22000
# 3 pandas 24000
# 4 Pandas 25000
# 5 Hadoop 25200
# 6 Hyperion 24500
# 7 Java 24900
Multiple Objects to Concatenate Using DataFrame.append()
Similarly. to concatenate multiple DataFrames using the DataFrame.append()
method, you can pass all the DataFrames as a list to the append()
method.
import pandas as pd
df = pd.DataFrame({'Courses': ["Spark", "PySpark", "Python", "Pandas"],
'Fee' : ['20000', '25000', '22000', '24000']})
df1 = pd.DataFrame({'Courses': ["Unix", "Hadoop", "Hyperion", "Java"],
'Fee': ['25000', '25200', '24500', '24900']})
df2 = pd.DataFrame({'Duration':['30days','40days','35days','60days','55days'],
'Discount':[1000,2300,2500,2000,3000]})
# Appending multiple DataFrame
data = df.append([df1, df2])
print(data)
# Output:
# Courses Fee Duration Discount
# 0 Spark 20000 NaN NaN
# 1 PySpark 25000 NaN NaN
# 2 Python 22000 NaN NaN
# 3 Pandas 24000 NaN NaN
# 0 Unix 25000 NaN NaN
# 1 Hadoop 25200 NaN NaN
# 2 Hyperion 24500 NaN NaN
# 3 Java 24900 NaN NaN
# 0 NaN NaN 30days 1000.0
# 1 NaN NaN 40days 2300.0
# 2 NaN NaN 35days 2500.0
# 3 NaN NaN 60days 2000.0
# 4 NaN NaN 55days 3000.0
FAQ on Pandas Combine Two DataFrames
You can combine two DataFrames in pandas using various methods such as concat()
, append()
, merge()
, or join()
, depending on your specific requirements.
The concat()
function is more versatile and can concatenate multiple DataFrames along either axis (rows or columns), while append()
is specifically designed to concatenate along rows. append()
is a shorthand for concatenating along rows, where concat()
allows for more flexibility.
To combine two DataFrames along columns, you can use the concat()
function with axis=1
. For example: pd.concat([df1, df2], axis=1)
.
You can combine DataFrames with different indices using concat()
or append()
. Set ignore_index=True
to reset the index of the resulting DataFrame.
When you set ignore_index=True
while combining DataFrames, it creates a new index for the resulting DataFrame, ignoring the existing indices of the original DataFrames. This ensures that the index is reset and is continuous.
Conclusion
In this article, you have learned how to append two DataFrames using DataFrame.append()
and pandas.concat()
functions with examples.
Happy Learning !!
Related Articles
- Empty Pandas DataFrame with Specific Column Types
- Sum Pandas DataFrame Columns With Examples
- How to Print Pandas DataFrame without Index
- Rename Specific Columns in Pandas
- Pandas Rename Index of DataFrame
- Pandas Sum DataFrame Columns With Examples
- Pandas groupby() and sum() With Examples
- Count NaN Values in Pandas DataFrame
- Pandas Sum DataFrame Rows With Examples