Pandas Concat Two DataFrames Explained

  • Post author:
  • Post category:Pandas / Python
  • Post last modified:January 16, 2022

Use pandas.concat() to concatenate/merge two or multiple pandas DataFrames across rows or columns. When you concat() two pandas DataFrames on rows, it creates a new Dataframe containing all rows of two DataFrames basically it does append one DataFrame with another. When you use concat() on columns it performs the join operation.

In this article, I will explain how to concatenate two pandas DataFrames by rows and columns with examples

pandas concat() Key Points

  • By default concat() method performs append operation meaning, it appends each DataFrame at the end of the another DataFrame and creates a single DataFrame
  • When you use concat() to join two DataFrames, it supports only inner and outer joins and by default it performs outer join.
  • Using concat you can join or append multiple pandas DataFrames

1. Quick Examples of Concat Two pandas DataFrames

If you are in a hurry, below are some quick examples of how to concatenate two DataFrames using concat() method.


# Below are quick example

# Using pandas.concat() to concat two DataFrame
data = [df, df1]
df2 = pd.concat(data)

# Use pandas.concat() method to ignore_index 
df2 = pd.concat([df, df1], ignore_index=True, sort=False)

# Using pandas.concat() Method
data = [df, df1]
df2 = pd.concat(data, ignore_index=True, sort=False)

#  Using pandas.concat() to join concat two DataFrames
data = pd.concat([df, df1], axis=1, join='inner')

# Using DataFrame.append() method
data = df.append(df1)

# Use DataFrame.append() 
df2 = df.append(df1, ignore_index=True)

# Appending multiple DataFrame
data = df.append([df1, df2])

2. pandas concat() Syntax and Usage

Below is the syntax of the pandas.concat() method.


pandas.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)

2. Use pandas.concat() to Concat Two DataFrames

First, let’s see pandas.concat() method to concat two DataFrames by rows meaning appending two DataFrames. By default, it performs append operations similar to a union where it bright all rows from both DataFrames to a single DataFrame. The below example demonstrates append using concat().


import pandas as pd
df = pd.DataFrame({'Courses': ["Spark","PySpark","Python","pandas"],
                    'Fee' : [20000,25000,22000,24000]})

df1 = pd.DataFrame({'Courses': ["Pandas","Hadoop","Hyperion","Java"],
                    'Fee': [25000,25200,24500,24900]})

# Using pandas.concat() to concat two DataFrames
data = [df, df1]
df2 = pd.concat(data)
print(df2)

Yields below output.


    Courses    Fee
0     Spark  20000
1   PySpark  25000
2    Python  22000
3    pandas  24000
0    Pandas  25000
1    Hadoop  25200
2  Hyperion  24500
3      Java  24900

If you notice in the above example, it just added the row index as-is from two DataFrame, sometimes you may want to reset the index. You can do so by using the ignore_index=True param.


# Use pandas.concat() method to ignore_index 
df2 = pd.concat([df, df1], ignore_index=True, sort=False)
print(df2)

Yields below output.


    Courses    Fee
0     Spark  20000
1   PySpark  25000
2    Python  22000
3    pandas  24000
4    Pandas  25000
5    Hadoop  25200
6  Hyperion  24500
7      Java  24900

3. Using pandas.concat() to Join Two DataFrames

As I said above pandas.concat() method is also used to join two pandas DataFrams on columns. In order to do so use axis=1, join='inner'. By default, pd.concat() is a row-wise outer join.


import pandas as pd
df = pd.DataFrame({'Courses':["Spark","PySpark","Python","pandas"],
                      'Fee' :[20000,25000,22000,24000]})
  
df1 = pd.DataFrame({'Duration':['30day','40days','35days','60days'],
                      'Discount':[1000,2300,2500,2000,]}) 

#  Using pandas.concat() to join concat two DataFrames
df2 = pd.concat([df, df1], axis=1, join='inner')
print(df2)

Yields below output.


   Courses    Fee Duration  Discount
0    Spark  20000    30day      1000
1  PySpark  25000   40days      2300
2   Python  22000   35days      2500
3   pandas  24000   60days      2000

4. Concatenate Multiple DataFrames Using pandas.concat()

You can also use pandas.concat() method to concatenate multiple DataFrames.


import pandas as pd
df = pd.DataFrame({'Courses': ["Spark", "PySpark", "Python", "Pandas"],
                    'Fee' : ['20000', '25000', '22000', '24000']}) 
  
df1 = pd.DataFrame({'Courses': ["Unix", "Hadoop", "Hyperion", "Java"],
                    'Fee': ['25000', '25200', '24500', '24900']})
  
df2 = pd.DataFrame({'Duration':['30day','40days','35days','60days','55days'],
                    'Discount':[1000,2300,2500,2000,3000]})
  
# Appending multiple DataFrame
df3 = pd.concat([df, df1, df2])
print(df3)

Yields below output.


    Courses    Fee Duration  Discount
0     Spark  20000      NaN       NaN
1   PySpark  25000      NaN       NaN
2    Python  22000      NaN       NaN
3    Pandas  24000      NaN       NaN
0      Unix  25000      NaN       NaN
1    Hadoop  25200      NaN       NaN
2  Hyperion  24500      NaN       NaN
3      Java  24900      NaN       NaN
0       NaN    NaN    30day    1000.0
1       NaN    NaN   40days    2300.0
2       NaN    NaN   35days    2500.0
3       NaN    NaN   60days    2000.0
4       NaN    NaN   55days    3000.0

4. Use DataFrame.append() to Concat Two DataFrames

Alternatively, you can use pandas.DataFrame.append() method to concatenate DataFrames on rows. For E.x, df.append(df1) appends df1 to the df DataFrame.


import pandas as pd
df = pd.DataFrame({'Courses': ["Spark","PySpark","Python","pandas"],
                    'Fee' : [20000,25000,22000,24000]})

df1 = pd.DataFrame({'Courses': ["Pandas","Hadoop","Hyperion","Java"],
                    'Fee': [25000,25200,24500,24900]})

# Using DataFrame.append() to concat Two DataFrames
df2 = df.append(df1)
print(df2)

Yields below output.


    Courses    Fee
0     Spark  20000
1   PySpark  25000
2    Python  22000
3    pandas  24000
0    Pandas  25000
1    Hadoop  25200
2  Hyperion  24500
3      Java  24900

Use ignore_index=True param to reset the index on combined DataFrame.


# Use DataFrame.append() 
df2 = df.append(df1, ignore_index=True)
print(df2)

Yields below output.


    Courses    Fee
0     Spark  20000
1   PySpark  25000
2    Python  22000
3    pandas  24000
4    Pandas  25000
5    Hadoop  25200
6  Hyperion  24500
7      Java  24900

Conclusion

In this article, I have explained how to concatenate two pandas DataFrames using pandas.concat() and DataFrame.append() methods with examples. concat() method is also used to concatenate multiple pandas DataFrames.

Happy Learning !!

You May Also Like

References

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

This Post Has One Comment

  1. Anonymous

    good for out standing khnowlgde

You are currently viewing Pandas Concat Two DataFrames Explained