Use pandas.concat()
and DataFrame.append()
to combine/merge two or multiple pandas DataFrames across rows or columns. DataFrame.append()
is very useful when you want to combine two DataFrames on the row axis, meaning it creates a new Dataframe containing all rows of two DataFrames. In this article, I will explain how to combine two pandas DataFrames using functions like pandas.concat()
and DataFrame.append()
with examples.
1. Quick Examples of Combine Two pandas DataFrames
If you are in a hurry, below are some quick examples of how to combine two pandas DataFrames.
# Below are quick example
# Using pandas.concat() to combine two DataFrame
data = [df, df1]
df2 = pd.concat(data)
# Use pandas.concat() method to ignore_index
df2 = pd.concat([df, df1], ignore_index=True, sort=False)
# Using pandas.concat() Method
data = [df, df1]
df2 = pd.concat(data, ignore_index=True, sort=False)
# Using pandas.concat() to join combine two DataFrames
data = pd.concat([df, df1], axis=1, join='inner')
# Using DataFrame.append() method
data = df.append(df1)
# Use DataFrame.append()
df2 = df.append(df1, ignore_index=True)
# Appending multiple DataFrame
data = df.append([df1, df2])
2. Use pandas.concat() to Combine Two DataFrames
First, let’s see pandas.concat()
method to combine two DataFrames, it is used to apply for both columns or rows from one DataFrame to another. It can also perform concatenation operations along with the axis while performing set logic to the indexes.
import pandas as pd
df = pd.DataFrame({'Courses': ["Spark","PySpark","Python","pandas"],
'Fee' : [20000,25000,22000,24000]})
df1 = pd.DataFrame({'Courses': ["Pandas","Hadoop","Hyperion","Java"],
'Fee': [25000,25200,24500,24900]})
# Using pandas.concat() to combine two DataFrames
data = [df, df1]
df2 = pd.concat(data)
print(df2)
Yields below output.
Courses Fee
0 Spark 20000
1 PySpark 25000
2 Python 22000
3 pandas 24000
0 Pandas 25000
1 Hadoop 25200
2 Hyperion 24500
3 Java 24900
You can also use pandas.concat()
, which is particularly helpful when you are joining more than two DataFrames. If you notice in the above example, it just added the row index as-is from two DataFrame, sometimes you may want to reset the index. You can do so by using the ignore_index=True
param.
# Use pandas.concat() method to ignore_index
df2 = pd.concat([df, df1], ignore_index=True, sort=False)
print(df2)
Yields below output.
Courses Fee
0 Spark 20000
1 PySpark 25000
2 Python 22000
3 pandas 24000
4 Pandas 25000
5 Hadoop 25200
6 Hyperion 24500
7 Java 24900
3. Using pandas.concat() to Join Two DataFrames
As I said above pandas.concat()
method is also used to join two DataFrams on columns. In order to do so use axis=1
, join='inner'
. By default, pd.concat()
is a row-wise outer join.
import pandas as pd
df = pd.DataFrame({'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,24000]})
df1 = pd.DataFrame({'Duration':['30day','40days','35days','60days'],
'Discount':[1000,2300,2500,2000,]})
# Using pandas.concat() to join combine two DataFrames
df2 = pd.concat([df, df1], axis=1, join='inner')
print(df2)
Yields below output.
Courses Fee Duration Discount
0 Spark 20000 30day 1000
1 PySpark 25000 40days 2300
2 Python 22000 35days 2500
3 pandas 24000 60days 2000
4. Use DataFrame.append() to Combine Two DataFrames
Alternatively, you can use DataFrame.append()
method to concatenate DataFrames on rows. For E.x, df.append(df1)
appends df1 to the df DataFrame.
import pandas as pd
df = pd.DataFrame({'Courses': ["Spark","PySpark","Python","pandas"],
'Fee' : [20000,25000,22000,24000]})
df1 = pd.DataFrame({'Courses': ["Pandas","Hadoop","Hyperion","Java"],
'Fee': [25000,25200,24500,24900]})
# Using DataFrame.append() to Combine Two DataFrames
df2 = df.append(df1)
print(df2)
Yields below output.
Courses Fee
0 Spark 20000
1 PySpark 25000
2 Python 22000
3 pandas 24000
0 Pandas 25000
1 Hadoop 25200
2 Hyperion 24500
3 Java 24900
Use ignore_index=True
param to reset the index on combined DataFrame.
# Use DataFrame.append()
df2 = df.append(df1, ignore_index=True)
print(df2)
Yields below output.
Courses Fee
0 Spark 20000
1 PySpark 25000
2 Python 22000
3 pandas 24000
4 Pandas 25000
5 Hadoop 25200
6 Hyperion 24500
7 Java 24900
5. Multiple Objects to Concatenate Using DataFrame.append()
You can also use DataFrame.append()
method to concatenate multiple DataFrames.
import pandas as pd
df = pd.DataFrame({'Courses': ["Spark", "PySpark", "Python", "Pandas"],
'Fee' : ['20000', '25000', '22000', '24000']})
df1 = pd.DataFrame({'Courses': ["Unix", "Hadoop", "Hyperion", "Java"],
'Fee': ['25000', '25200', '24500', '24900']})
df2 = pd.DataFrame({'Duration':['30day','40days','35days','60days','55days'],
'Discount':[1000,2300,2500,2000,3000]})
# Appending multiple DataFrame
data = df.append([df1, df2])
print(data)
Yields below output.
Courses Fee Duration Discount
0 Spark 20000 NaN NaN
1 PySpark 25000 NaN NaN
2 Python 22000 NaN NaN
3 Pandas 24000 NaN NaN
0 Unix 25000 NaN NaN
1 Hadoop 25200 NaN NaN
2 Hyperion 24500 NaN NaN
3 Java 24900 NaN NaN
0 NaN NaN 30day 1000.0
1 NaN NaN 40days 2300.0
2 NaN NaN 35days 2500.0
3 NaN NaN 60days 2000.0
4 NaN NaN 55days 3000.0
Conclusion
In this article, I have explained how to append two pandas DataFrames using DataFrame.append()
and pandas.concat()
methods with examples.
Happy Learning !!
Related Articles
- Empty Pandas DataFrame with Specific Column Types
- Sum Pandas DataFrame Columns With Examples
- How to Print Pandas DataFrame without Index
- Rename Specific Columns in Pandas
- Pandas Rename Index of DataFrame
- Pandas Sum DataFrame Columns With Examples
- Pandas groupby() and sum() With Examples
- Count NaN Values in Pandas DataFrame
- Pandas Sum DataFrame Rows With Examples