In pandas, you can use the concat()
function to union the DataFrames along with a particular axis (either rows or columns). You can union the Pandas DataFrames using the concat()
function, by either vertical(concatenating along rows) or horizontal(concatenating along columns) concatenation. In this article, I will explain how to union two pandas DataFrames by rows and columns with examples.
Key Points –
- The
pd.concat()
function is used to concatenate (or union) multiple DataFrames along a particular axis, either rows or columns. - By default,
pd.concat()
appends DataFrames row-wise, creating a union of rows while maintaining the same columns across DataFrames. - Setting the
axis
parameter to1
inpd.concat()
allows concatenation along columns, effectively unioning DataFrames column-wise. pd.concat()
preserves the index of the original DataFrames by default. To reset the index in the concatenated DataFrame, use theignore_index=True
parameter.- Use the
keys
parameter to add hierarchical index levels to identify which DataFrame each row originated from, especially useful when combining multiple DataFrames. - For large datasets, consider using
ignore_index=True
to optimize performance by avoiding costly index operations during concatenation.
Quick Examples of Union of Pandas DataFrames
If you are in a hurry, below are some quick examples of how to union pandas DataFrames using concat.
# Quick examples of union of pandas DataFrames
# Example 1: Union pandas DataFrames
# Using concat()
df2 = pd.concat([df, df1])
# Example 2: Reset the index
# Using concat() with ignore_index
df2 = pd.concat([df, df1], ignore_index=True)
# Example 3: Concatenate pandas DataFrames along columns
df2 = pd.concat([df, df1], axis=1)
# Example 4: Concatenate with keys
result = pd.concat([df, df1], keys=['df', 'df1'])
# Example 5: Union with keys
result = pd.concat([df, df1], keys=['df', 'df1']).drop_duplicates()
Create DataFrames
Let’s create two DataFrames with a few rows and columns, execute these examples, and validate the results. Our DataFrame contains column names Courses
, Fee
, and Duration
.
# Create first DataFrame
import pandas as pd
technologies = [ ["Spark",20000,"30days"],
["Pandas",25000,"40days"],
]
column_names=["Courses","Fee",'Duration']
df=pd.DataFrame(technologies, columns=column_names)
print("First DataFrame:\n",df)
# Create second DataFrame
technologies = [ ["Hadoop",25000,"50days"],
["Java",30000,"40days"],
]
column_names=["Courses","Fee",'Duration']
df1=pd.DataFrame(technologies, columns=column_names)
print("Second DataFrame:\n",df1)
Yields below output.
Union of Pandas DataFrames using concat()
To concatenate DataFrames, use the concat()
method. By default, the method concatenates the given DataFrames vertically (i.e., row-wise) and returns a single DataFrame containing values from the given DataFrames.
In the below example, concatenates the DataFrames df
and df1
along the rows (axis=0), performing a union operation. The resulting DataFrame (df2
) contains all the rows from both original DataFrames, and the index is not reset, meaning it retains the original indices.
# Union pandas DataFrames using concat()
df2 = pd.concat([df, df1])
print("Union pandas dataframes using concat:\n", df2)
Yields below output.
Reset the Index of Union the DataFrames
If you want to create a new DataFrame without having the indexes of the concatenated DataFrames, you can set the ignore_index = True
and pass it into the concat()
function along with two DataFrames. It will return the DataFrame containing a union of rows with new indexes from given DataFrames.
# Reset the index using concat() with ignore_index
df2 = pd.concat([df, df1], ignore_index=True)
print("Union pandas DataFrames and reset index:\n", df2)
Yields below output.
Union pandas DataFrames and reset index:
Courses Fee Duration
0 Spark 20000 30days
1 Pandas 25000 40days
2 Hadoop 25000 50days
3 Java 30000 40days
As you can see, the resulting DataFrame includes all unique rows from both original DataFrames, performing a union operation. If there are duplicate rows, they will be retained in the result.
Horizontal Union (Concatenation along Columns)
Alternatively, you can union the DataFrames along with columns using the concat() function. For that, you can set and pass axis=1
as an argument into pd.concat()
function. This function will concatenate the columns of two DataFrames side by side and return a new DataFrame as a result.
# Concatenate pandas DataFrames along columns
df2 = pd.concat([df, df1], axis=1)
print("Concatenate pandas DataFrames along columns:\n", df2)
Yields below output.
Concatenate pandas DataFrames along columns:
Courses Fee Duration Courses Fee Duration
0 Spark 20000 30days Hadoop 25000 50days
1 Pandas 25000 40days Java 30000 40days
Concatenate with keys
When you use the keys
parameter with pd.concat()
, it allows you to create a hierarchical index based on the provided keys.
In the below example, the resulting DataFrame has a multi-level index where the first level corresponds to the keys (‘df1’ and ‘df2’), and the second level corresponds to the original index of the DataFrames.
# Concatenate with keys
result = pd.concat([df, df1], keys=['df', 'df1'])
print("Concatenate with keys:\n",result)
# Union with keys
result = pd.concat([df, df1], keys=['df', 'df1']).drop_duplicates()
print("Union with keys:\n",result)
# Output:
# Concatenate with keys:
# Courses Fee Duration
# df 0 Spark 20000 30days
# 1 Pandas 25000 40days
# df1 0 Hadoop 25000 50days
# 1 Java 30000 40days
Frequently Asked Questions on Union Pandas DataFrames using Concat
Concatenation in the context of Pandas DataFrames refers to combining two or more DataFrames along a specified axis (either rows or columns). It is a way to vertically or horizontally stack DataFrames to create a larger DataFrame.
To perform a union of two Pandas DataFrames using pd.concat
, you can concatenate them along the rows (axis=0). This operation combines the rows of both DataFrames, and the resulting DataFrame will include all unique rows from the original DataFrames.
The ignore_index
parameter in pd.concat
is used to reset the index of the resulting DataFrame. When set to True
, it creates a new integer index for the concatenated DataFrame, ensuring a continuous index without retaining the original indices from the input DataFrames.
You can concatenate Pandas DataFrames along columns using the pd.concat
function with the axis=1
parameter. Concatenating along columns means combining the columns of two or more DataFrames side by side.
If there are duplicate indexes in the input DataFrames and you do not use ignore_index=True
, the resulting DataFrame will retain the duplicate indexes. If you use ignore_index=True
, the index will be reset, and a new integer index will be created for the concatenated DataFrame.
Besides pd.concat
, Pandas provides other methods like append
and merge
for combining DataFrames. The choice of method depends on the specific requirements of the data manipulation task.
Conclusion
In this article, I have explained how to union pandas DataFrames using pd.concat()
function is a versatile tool for concatenating or unioning DataFrames along either rows or columns. It is a powerful method for combining data, and it provides flexibility through various parameters like ignore_index
, keys
, and axis
.
Happy Learning !!
Related Articles
- Pandas Merge Two DataFrames
- How to Append Pandas Series?
- Pandas Merge DataFrames on Index
- How to Combine Two DataFrames?
- Pandas Merge DataFrames on Index
- How to Append Two pandas DataFrames
- How to Append Row to pandas DataFrame
- Append Pandas DataFrames Using for Loop
- How to Merge Series into Pandas DataFrame
- Pandas Merge DataFrames Explained Examples
- How to Append Two DataFrames with Examples?
- Pandas Stack Two Series Vertically and Horizontally
- Pandas Append Rows & Columns to Empty DataFrame