How to Union Pandas DataFrames using Concat?

In pandas, you can use the concat() function to union the DataFrames along with a particular axis (either rows or columns). You can union the Pandas DataFrames using the concat() function, by either vertical(concatenating along rows) or horizontal(concatenating along columns) concatenation. In this article, I will explain how to union two pandas DataFrames by rows and columns with examples.

1. Quick Examples of Union of Pandas DataFrames

If you are in a hurry, below are some quick examples of how to union pandas DataFrames using concat.


# Quick examples of union of pandas DataFrames

# Example 1: Union pandas DataFrames 
# Using concat()
df2 = pd.concat([df, df1])

# Example 2: Reset the index using concat() with ignore_index
df2 = pd.concat([df, df1], ignore_index=True)

# Example 3: Concatenate pandas DataFrames along columns
df2 = pd.concat([df, df1], axis=1)

# Example 4: Concatenate with keys
result = pd.concat([df, df1], keys=['df', 'df1'])

# Example 5: Union with keys
result = pd.concat([df, df1], keys=['df', 'df1']).drop_duplicates()

2. Create DataFrames

Let’s create two DataFrames with a few rows and columns, execute these examples, and validate the results. Our DataFrame contains column names Courses, Fee, and Duration.


# Create first DataFrame
import pandas as pd
technologies = [ ["Spark",20000,"30days"], 
                 ["Pandas",25000,"40days"], 
               ]
column_names=["Courses","Fee",'Duration']
df=pd.DataFrame(technologies, columns=column_names)
print("First DataFrame:\n",df)

# Create second DataFrame
technologies = [ ["Hadoop",25000,"50days"], 
                 ["Java",30000,"40days"], 
               ]
column_names=["Courses","Fee",'Duration']
df1=pd.DataFrame(technologies, columns=column_names)
print("Second DataFrame:\n",df1)

Yields below output.

3. Union of Pandas DataFrames using concat()

To concatenate DataFrames, use the concat() method. By default, the method concatenates the given DataFrames vertically (i.e., row-wise) and returns a single DataFrame containing values from the given DataFrames.

In the below example, concatenates the DataFrames df and df1 along the rows (axis=0), performing a union operation. The resulting DataFrame (df2) contains all the rows from both original DataFrames, and the index is not reset, meaning it retains the original indices.


# Union pandas DataFrames using concat()
df2 = pd.concat([df, df1])
print("Union pandas dataframes using concat:\n", df2)

Yields below output.

4. Reset the Index of Union the DataFrames

If you want to create a new DataFrame without having the indexes of the concatenated DataFrames, you can set the ignore_index = True and pass it into the concat() function along with two DataFrames. It will return the DataFrame containing a union of rows with new indexes from given DataFrames.


# Reset the index using concat() with ignore_index
df2 = pd.concat([df, df1], ignore_index=True)
print("Union pandas DataFrames and reset index:\n", df2)

Yields below output.


Union pandas DataFrames and reset index:
   Courses    Fee Duration
0   Spark  20000   30days
1  Pandas  25000   40days
2  Hadoop  25000   50days
3    Java  30000   40days

As you can see, the resulting DataFrame includes all unique rows from both original DataFrames, performing a union operation. If there are duplicate rows, they will be retained in the result.

5. Horizontal Union (Concatenation along Columns)

Alternatively, you can union the DataFrames along with columns using the concat() function. For that, you can set and pass axis=1 as an argument into pd.concat() function. This function will concatenate the columns of two DataFrames side by side and return a new DataFrame as a result.


# Concatenate pandas DataFrames along columns
df2 = pd.concat([df, df1], axis=1)
print("Concatenate pandas DataFrames along columns:\n", df2)

Yields below output.


Concatenate pandas DataFrames along columns:
   Courses    Fee Duration Courses    Fee Duration
0   Spark  20000   30days  Hadoop  25000   50days
1  Pandas  25000   40days    Java  30000   40days

6. Concatenate with keys

When you use the keys parameter with pd.concat(), it allows you to create a hierarchical index based on the provided keys.

In the below example, the resulting DataFrame has a multi-level index where the first level corresponds to the keys (‘df1’ and ‘df2’), and the second level corresponds to the original index of the DataFrames.


# Concatenate with keys
result = pd.concat([df, df1], keys=['df', 'df1'])
print("Concatenate with keys:\n",result)

# Union with keys
result = pd.concat([df, df1], keys=['df', 'df1']).drop_duplicates()
print("Union with keys:\n",result)

# Output:
# Concatenate with keys:
#       Courses    Fee Duration
# df  0   Spark  20000   30days
#     1  Pandas  25000   40days
# df1 0  Hadoop  25000   50days
#     1    Java  30000   40days

Frequently Asked Questions on Union Pandas DataFrames using Concat

What does concatenation mean in the context of Pandas DataFrames?

Concatenation in the context of Pandas DataFrames refers to combining two or more DataFrames along a specified axis (either rows or columns). It is a way to vertically or horizontally stack DataFrames to create a larger DataFrame.

How do I perform a union of two Pandas DataFrames using pd.concat?

To perform a union of two Pandas DataFrames using pd.concat, you can concatenate them along the rows (axis=0). This operation combines the rows of both DataFrames, and the resulting DataFrame will include all unique rows from the original DataFrames.

What is the purpose of the ignore_index parameter in pd.concat?

The ignore_index parameter in pd.concat is used to reset the index of the resulting DataFrame. When set to True, it creates a new integer index for the concatenated DataFrame, ensuring a continuous index without retaining the original indices from the input DataFrames.

Can I concatenate Pandas DataFrames along columns?

You can concatenate Pandas DataFrames along columns using the pd.concat function with the axis=1 parameter. Concatenating along columns means combining the columns of two or more DataFrames side by side.

What happens if there are duplicate indexes in the input DataFrames during concatenation?

If there are duplicate indexes in the input DataFrames and you do not use ignore_index=True, the resulting DataFrame will retain the duplicate indexes. If you use ignore_index=True, the index will be reset, and a new integer index will be created for the concatenated DataFrame.

Are there other methods in Pandas for combining DataFrames?

Besides pd.concat, Pandas provides other methods like append and merge for combining DataFrames. The choice of method depends on the specific requirements of the data manipulation task.

Conclusion

In this article, I have explained how to union pandas DataFrames using pd.concat() function is a versatile tool for concatenating or unioning DataFrames along either rows or columns. It is a powerful method for combining data, and it provides flexibility through various parameters like ignore_index, keys, and axis.

Happy Learning !!

References

https://pandas.pydata.org/docs/user_guide/merging.html

1. Quick Examples of Union of Pandas DataFrames

2. Create DataFrames

3. Union of Pandas DataFrames using concat()

4. Reset the Index of Union the DataFrames

5. Horizontal Union (Concatenation along Columns)

6. Concatenate with keys

Frequently Asked Questions on Union Pandas DataFrames using Concat

Conclusion

Related Articles

References

Malli