• Post author:
  • Post category:Pandas
  • Post last modified:April 22, 2024
  • Reading time:18 mins read
You are currently viewing Pandas Concat Two DataFrames Explained

You can use the pandas.concat() function to concatenate or merge two or more pandas DataFrames either along rows or columns. When you use concat() to concatenate two pandas DataFrames along rows, it creates a new DataFrame containing all the rows from both DataFrames, essentially appending one DataFrame to another. When you use concat() on columns it performs the join operation.

Advertisements

In this article, I will explain the concat() function and using its syntax, parameters, and usage how we can concatenate two pandas DataFrame by rows and columns with examples.

Key Points –

  • By default concat() method performs an append operation meaning, it appends each DataFrame at the end of another DataFrame and creates a single DataFrame.
  • When you use concat() to join two DataFrames, it supports only inner and outer joins, and by default, it performs outer join.
  • Using concat you can join or append multiple pandas DataFrames.
  • pd.concat() is used to concatenate pandas DataFrames along rows or columns.
  • The ignore_index=True parameter resets the index of the concatenated DataFrame.

Related: In Pandas, you can also concatenate Pandas DataFrame columns.

Quick Examples of Concat Two DataFrames

If you are in a hurry, below are some quick examples of how to concatenate two DataFrames using the concat() method.


# Quick examples of concat two dataframes

# Using pandas.concat() 
# To concat two DataFrame
data = [df, df1]
df2 = pd.concat(data)

# Use pandas.concat() method to ignore_index 
df2 = pd.concat([df, df1], ignore_index=True, sort=False)

# Using pandas.concat() method
data = [df, df1]
df2 = pd.concat(data, ignore_index=True, sort=False)

# Using pandas.concat() 
# To join concat two DataFrames
data = pd.concat([df, df1], axis=1, join='inner')

# Using DataFrame.append() method
data = df.append(df1)

# Use DataFrame.append() 
df2 = df.append(df1, ignore_index=True)

# Appending multiple DataFrame
data = df.append([df1, df2])

pandas concat() Syntax and Usage

Following is the syntax of the pandas.concat() method.


# Syntax of concat() function
pandas.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)

Parameters of the concat()

Following are the parameters of the concat() function.

  • objs – List or dict of pandas objects to be concatenated.
  • axis – Axis along which the concatenation will be performed. By default, it’s 0 (rows).
  • join – Type of join to be performed. It can be ‘inner’ or ‘outer’. Defaults to ‘outer’.
  • ignore_index – If True, do not use the index values along the concatenation axis. Defaults to False.
  • keys – Values to associate with the concatenated objects along the concatenation axis. It’s useful for creating a hierarchical index.
  • levels – Specific level(s) (zero-indexed) from the keys to use as index levels.
  • names – Names for the levels in the resulting hierarchical index.
  • verify_integrity – If True, check whether the new concatenated axis contains duplicates. Defaults to False.
  • sort – If True, sort the resulting DataFrame by the labels along the concatenation axis. Defaults to False.
  • copy – If False, avoid copying data unnecessarily. Defaults to True.

Return Value

It returns the pd.concat() function as a new pandas DataFrame or Series, depending on the input objects.

Use pandas.concat() to Concat Two DataFrames

First, let’s create two Pandas DataFrames with different content, and then, you can apply the concat() method to concat the given DataFrames.


import pandas as pd
df = pd.DataFrame({'Courses': ["Spark","PySpark","Python","pandas"],
                    'Fee' : [20000,25000,22000,24000]})

df1 = pd.DataFrame({'Courses': ["Pandas","Hadoop","Hyperion","Java"],
                    'Fee': [25000,25200,24500,24900]})
print("First DataFrame:\n", df)
print("Second DataFrame:\n", df1)

Yields below output.

Pandas concat two DataFrames

You can use pandas.concat() method to concat two DataFrames by rows meaning appending two DataFrames. By default, it performs append operations similar to a union where it bright all rows from both DataFrames to a single DataFrame. The below example demonstrates append using concat().


# Using pandas.concat() to concat two DataFrames
data = [df, df1]
df2 = pd.concat(data)
print("After concatenating the two DataFrames:\n", df2)

Yields below output:

Pandas concat two DataFrames

The ignore_index=True parameter in pd.concat() can be used to reset the index when concatenating DataFrames. With ignore_index=True, the index of the concatenated DataFrame will be reset to start from 0, regardless of the indices of the original DataFrames. This can be useful when you want to create a new DataFrame with a continuous index after concatenation.


# Use pandas.concat() method to ignore_index 
df2 = pd.concat([df, df1], ignore_index=True, sort=False)
print(df2)

Yields below output.


# Output:
    Courses    Fee
0     Spark  20000
1   PySpark  25000
2    Python  22000
3    pandas  24000
4    Pandas  25000
5    Hadoop  25200
6  Hyperion  24500
7      Java  24900

Using pandas.concat() to Join Two DataFrames

You can use pandas.concat() method to perform column-wise joins (concatenation) between two DataFrames. When you use axis=1 and join=inner, it performs an inner join along the columns.


import pandas as pd
df = pd.DataFrame({'Courses':["Spark","PySpark","Python","pandas"],
                      'Fee' :[20000,25000,22000,24000]})
  
df1 = pd.DataFrame({'Duration':['30day','40days','35days','60days'],
                      'Discount':[1000,2300,2500,2000,]}) 

# Using pandas.concat() to join concat two DataFrames
df2 = pd.concat([df, df1], axis=1, join='inner')
print(df2)

In the above example, pd.concat() is used to join df and df1 along columns (axis=1) with an inner join (join=inner). The resulting DataFrame (df2) will contain only the columns that exist in both df and df1, based on the column names. This example yields the below output.


# Output:
   Courses    Fee Duration  Discount
0    Spark  20000    30day      1000
1  PySpark  25000   40days      2300
2   Python  22000   35days      2500
3   pandas  24000   60days      2000

Concatenate Multiple DataFrames Using pandas.concat()

Alternatively, you can concatenate multiple DataFrames using pandas.concat() by passing a list of DataFrames to be concatenated.


import pandas as pd
df = pd.DataFrame({'Courses': ["Spark", "PySpark", "Python", "Pandas"],
                    'Fee' : ['20000', '25000', '22000', '24000']}) 
  
df1 = pd.DataFrame({'Courses': ["Unix", "Hadoop", "Hyperion", "Java"],
                    'Fee': ['25000', '25200', '24500', '24900']})
  
df2 = pd.DataFrame({'Duration':['30day','40days','35days','60days','55days'],
                    'Discount':[1000,2300,2500,2000,3000]})
  
# Appending multiple DataFrame
df3 = pd.concat([df, df1, df2])
print(df3)

In the above example, df, df1, and df2 are concatenated along rows (default behavior) to create a single DataFrame, df3. Each DataFrame should have the same column structure to concatenate properly. This example yields the below output.


# Output:
    Courses    Fee Duration  Discount
0     Spark  20000      NaN       NaN
1   PySpark  25000      NaN       NaN
2    Python  22000      NaN       NaN
3    Pandas  24000      NaN       NaN
0      Unix  25000      NaN       NaN
1    Hadoop  25200      NaN       NaN
2  Hyperion  24500      NaN       NaN
3      Java  24900      NaN       NaN
0       NaN    NaN    30day    1000.0
1       NaN    NaN   40days    2300.0
2       NaN    NaN   35days    2500.0
3       NaN    NaN   60days    2000.0
4       NaN    NaN   55days    3000.0

Use DataFrame.append() to Concat Two DataFrames

Similarly, you can use the DataFrame.append() method to concatenate two DataFrames along rows. For instance, df.append(df1) appends df1 to the df DataFrame.


import pandas as pd
df = pd.DataFrame({'Courses': ["Spark","PySpark","Python","pandas"],
                    'Fee' : [20000,25000,22000,24000]})

df1 = pd.DataFrame({'Courses': ["Pandas","Hadoop","Hyperion","Java"],
                    'Fee': [25000,25200,24500,24900]})

# Using DataFrame.append() 
# To concat two dataframes
df2 = df.append(df1)
print(df2)

Yields below output.


# Output:
    Courses    Fee
0     Spark  20000
1   PySpark  25000
2    Python  22000
3    pandas  24000
0    Pandas  25000
1    Hadoop  25200
2  Hyperion  24500
3      Java  24900

You can use the ignore_index=True parameter in the DataFrame.append() method to reset the index on the combined DataFrame.


# Use DataFrame.append() 
df2 = df.append(df1, ignore_index=True)
print(df2)

Yields below output.


# Output:
    Courses    Fee
0     Spark  20000
1   PySpark  25000
2    Python  22000
3    pandas  24000
4    Pandas  25000
5    Hadoop  25200
6  Hyperion  24500
7      Java  24900

Frequently Asked Questions

What is the purpose of concatenating two DataFrames in Pandas?

Concatenating two DataFrames in Pandas is a common operation used to combine two or more DataFrames along a particular axis (either rows or columns) to create a single, larger DataFrame. It is often used to merge or stack data from different sources for further analysis.

How can I concatenate two DataFrames in Pandas?

You can use the pd.concat() function in Pandas to concatenate two or more DataFrames.

Can I concatenate DataFrames with different columns?

You can concatenate DataFrames with different columns. By default, pd.concat() aligns the columns based on their names. Missing columns will be filled with NaN values.

What is the difference between concatenating along the rows (axis=0) and along the columns (axis=1)?

Concatenating along axis=0 (the default) combines DataFrames vertically, stacking them on top of each other, while concatenating along axis=1 combines DataFrames horizontally, extending them side by side.

How do I specify the axis for concatenation?

To specify the axis explicitly using the axis parameter in pd.concat(). For instance, to concatenate along columns, use axis=1, and to concatenate along rows, use axis=0.

Conclusion

In this article, I have explained how to concatenate two pandas DataFrames using pandas.concat() and DataFrame.append() methods with examples. concat() method is also used to concatenate multiple pandas DataFrames.

Happy Learning !!

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

This Post Has One Comment

  1. Anonymous

    good for out standing khnowlgde

Comments are closed.