You can use the pandas.concat()
function to concatenate or merge two or more pandas DataFrames either along rows or columns. When concatenating DataFrames along rows, concat()
creates a new DataFrame that includes all rows from the input DataFrames, effectively appending one DataFrame to another. Conversely, when concatenating along columns, concat()
performs a join operation, combining the DataFrames side-by-side based on their indexes.
In this article, I will explain the concat()
function and using its syntax, parameters, and usage how we can concatenate two pandas DataFrame by rows and columns.
Key Points –
- By default
concat()
method performs an append operation meaning, it appends each DataFrame at the end of another DataFrame and creates a single DataFrame. - When you use
concat()
to join two DataFrames, it supports only inner and outer joins, and by default, it performs outer join. - Using concat you can join or append multiple pandas DataFrames.
pd.concat()
is used to concatenate pandas DataFrames along rows or columns.- The
ignore_index=True
parameter resets the index of the concatenated DataFrame.
Quick Examples of Concat Two DataFrames
Following are quick examples of concatenating two DataFrames using the concat() method.
# Quick examples of concat two dataframes
# Using pandas.concat()
# To concat two DataFrame
data = [df, df1]
df2 = pd.concat(data)
# Use pandas.concat() method to ignore_index
df2 = pd.concat([df, df1], ignore_index=True, sort=False)
# Using pandas.concat() method
data = [df, df1]
df2 = pd.concat(data, ignore_index=True, sort=False)
# Using pandas.concat()
# To join concat two DataFrames
data = pd.concat([df, df1], axis=1, join='inner')
# Using DataFrame.append() method
data = df.append(df1)
# Use DataFrame.append()
df2 = df.append(df1, ignore_index=True)
# Appending multiple DataFrame
data = df.append([df1, df2])
pandas concat() Syntax and Usage
Following is the syntax of the pandas.concat() method.
# Syntax of concat() function
pandas.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)
Parameters
Following are the parameters of the concat() method.
objs
– This is a sequence or mapping of Series or DataFrame objects. If a dictionary is passed, the keys will be used to construct a hierarchical index.axis
– {0 or ‘index’, 1 or ‘columns’}, default 0. The axis concatenates along.0
or'index'
means concatenate along rows (i.e., vertically).1
or'columns'
means concatenate along columns (i.e., horizontally).join
– Type of join to be performed. It can be ‘inner’ or ‘outer’. Defaults to ‘outer’.ignore_index
– If True, do not use the index values along the concatenation axis. Defaults to False.keys
– Values to associate with the concatenated objects along the concatenation axis. It’s useful for creating a hierarchical index.levels
– Specific level(s) (zero-indexed) from the keys to use as index levels.names
– Names for the levels in the resulting hierarchical index.verify_integrity
– If True, check whether the new concatenated axis contains duplicates. Defaults to False.sort
– If True, sort the resulting DataFrame by the labels along the concatenation axis. Defaults to False.copy
– If False, avoid copying data unnecessarily. Defaults to True.
Return Value
It returns the pd.concat()
function as a new pandas DataFrame or Series, depending on the input objects.
Use pandas.concat() to Concat Two DataFrames
First, let’s create two Pandas DataFrames with different content, and then, you can apply the concat()
method to concat the given DataFrames.
import pandas as pd
df = pd.DataFrame({'Courses': ["Spark","PySpark","Python","pandas"],
'Fee' : [20000,25000,22000,24000]})
df1 = pd.DataFrame({'Courses': ["Pandas","Hadoop","Hyperion","Java"],
'Fee': [25000,25200,24500,24900]})
print("First DataFrame:\n", df)
print("Second DataFrame:\n", df1)
Yields below output.
You can use pandas.concat()
method to concat two DataFrames by rows meaning appending two DataFrames. By default, it performs append operations similar to a union where it bright all rows from both DataFrames to a single DataFrame. The below example demonstrates append using concat()
.
# Using pandas.concat() to concat two DataFrames
data = [df, df1]
df2 = pd.concat(data)
print("After concatenating the two DataFrames:\n", df2)
Yields below output:
The ignore_index=True
parameter in pd.concat()
can be used to reset the index when concatenating DataFrames. With ignore_index=True
, the index of the concatenated DataFrame will be reset to start from 0, regardless of the indices of the original DataFrames. This can be useful when you want to create a new DataFrame with a continuous index after concatenation.
# Use pandas.concat() method to ignore_index
df2 = pd.concat([df, df1], ignore_index=True, sort=False)
print(df2)
Yields below output.
# Output:
Courses Fee
0 Spark 20000
1 PySpark 25000
2 Python 22000
3 pandas 24000
4 Pandas 25000
5 Hadoop 25200
6 Hyperion 24500
7 Java 24900
Using pandas.concat() to Join Two DataFrames
You can use pandas.concat()
method to perform column-wise joins (concatenation) between two DataFrames. When you use axis=1
and join=inner
, it performs an inner join along the columns.
import pandas as pd
df = pd.DataFrame({'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,24000]})
df1 = pd.DataFrame({'Duration':['30day','40days','35days','60days'],
'Discount':[1000,2300,2500,2000,]})
# Using pandas.concat() to join concat two DataFrames
df2 = pd.concat([df, df1], axis=1, join='inner')
print(df2)
In this code, pd.concat()
is used to join df
and df1
along columns (axis=1)
with an inner join (join=inner)
. The resulting DataFrame (df2
) will contain only the columns that exist in both df
and df1
, based on the column names. This example yields the below output.
# Output:
Courses Fee Duration Discount
0 Spark 20000 30day 1000
1 PySpark 25000 40days 2300
2 Python 22000 35days 2500
3 pandas 24000 60days 2000
Concatenate Multiple DataFrames Using pandas.concat()
Alternatively, you can concatenate multiple DataFrames using pandas.concat()
by passing a list of DataFrames to be concatenated.
import pandas as pd
df = pd.DataFrame({'Courses': ["Spark", "PySpark", "Python", "Pandas"],
'Fee' : ['20000', '25000', '22000', '24000']})
df1 = pd.DataFrame({'Courses': ["Unix", "Hadoop", "Hyperion", "Java"],
'Fee': ['25000', '25200', '24500', '24900']})
df2 = pd.DataFrame({'Duration':['30day','40days','35days','60days','55days'],
'Discount':[1000,2300,2500,2000,3000]})
# Appending multiple DataFrame
df3 = pd.concat([df, df1, df2])
print(df3)
In the above example, df
, df1
, and df2
are concatenated along rows (default behavior) to create a single DataFrame, df3
. Each DataFrame should have the same column structure to concatenate properly. This example yields the below output.
# Output:
Courses Fee Duration Discount
0 Spark 20000 NaN NaN
1 PySpark 25000 NaN NaN
2 Python 22000 NaN NaN
3 Pandas 24000 NaN NaN
0 Unix 25000 NaN NaN
1 Hadoop 25200 NaN NaN
2 Hyperion 24500 NaN NaN
3 Java 24900 NaN NaN
0 NaN NaN 30day 1000.0
1 NaN NaN 40days 2300.0
2 NaN NaN 35days 2500.0
3 NaN NaN 60days 2000.0
4 NaN NaN 55days 3000.0
Use DataFrame.append() to Concat Two DataFrames
Similarly, you can use the DataFrame.append() method to concatenate two DataFrames along rows. For instance, df.append(df1)
appends df1
to the df
DataFrame.
import pandas as pd
df = pd.DataFrame({'Courses': ["Spark","PySpark","Python","pandas"],
'Fee' : [20000,25000,22000,24000]})
df1 = pd.DataFrame({'Courses': ["Pandas","Hadoop","Hyperion","Java"],
'Fee': [25000,25200,24500,24900]})
# Using DataFrame.append()
# To concat two dataframes
df2 = df.append(df1)
print(df2)
Yields below output.
# Output:
Courses Fee
0 Spark 20000
1 PySpark 25000
2 Python 22000
3 pandas 24000
0 Pandas 25000
1 Hadoop 25200
2 Hyperion 24500
3 Java 24900
You can use the ignore_index=True
parameter in the DataFrame.append()
method to reset the index on the combined DataFrame. This ensures that the index of the resulting DataFrame is a continuous integer sequence, starting from 0.
# Use DataFrame.append()
df2 = df.append(df1, ignore_index=True)
print(df2)
Yields below output.
# Output:
Courses Fee
0 Spark 20000
1 PySpark 25000
2 Python 22000
3 pandas 24000
4 Pandas 25000
5 Hadoop 25200
6 Hyperion 24500
7 Java 24900
FAQ on Concat Two Pandas DataFrames
You can use the pd.concat()
function in Pandas to concatenate two or more DataFrames.
To specify the axis explicitly using the axis
parameter in pd.concat()
. For instance, to concatenate along columns, use axis=1
, and to concatenate along rows, use axis=0
.
If the DataFrames have different columns, missing values (NaN) will be introduced in the resulting DataFrame where data is missing
To concatenate DataFrames with different indexes, you can use the concat()
function in pandas. By default, pandas will align the DataFrames along the axis you specify (either rows or columns) based on their indexes.
Concatenating along columns with different lengths can be tricky because pandas aligns data based on indices. If the DataFrames have different lengths along the axis you want to concatenate, you may end up with NaN values in the resulting DataFrame where data is missing.
Conclusion
In this article, you have learned to concatenate two pandas DataFrames using pandas.concat()
and DataFrame.append()
methods. concat()
method is also used to concatenate multiple pandas DataFrames with examples.
Happy Learning !!
Related Articles
- Pandas Merge DataFrames on Index
- Pandas Merge Two DataFrames
- Pandas Merge DataFrames Explained Examples
- How to Append Two DataFrames with Examples?
- How to Combine Two DataFrames?
- How to Append Pandas Series?
- Append Pandas DataFrames Using for Loop
- Pandas Stack Two Series Vertically and Horizontally
- Pandas Append Rows & Columns to Empty DataFrame
- How to Append Row to pandas DataFrame
- How to Append Two pandas DataFrames
- How to Merge Series into Pandas DataFrame
- Pandas Merge DataFrames on Index
References
- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.combine.html
- https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
good for out standing khnowlgde