Pandas append() Usage by Examples

  • Post author:
  • Post category:Pandas / Python
  • Post last modified:January 16, 2022

pandas.DataFrame.append() method is used to append one DataFrame row(s) and column(s) with another, it can also be used to append multiple (three or more) DataFrames. This method takes other (DataFrame you wanted to append), ignore_index, verify_integrity, sort as parameters and returns a new DataFrame with the combined result.

In this article, I will explain how to append pandas DataFrame with examples like appending rows, columns, ignoring index while appending, and more by using its parameters.

1. pandas append() Syntax

Below is the syntax of pandas.DataFrame.append() method.


# Syntax of append()
DataFrame.append(other, ignore_index=False, verify_integrity=False, sort=False)
  • otherDataFrame or Series/dict-like object, or list of these.
  • ignore_indexbool, default False. When set to True, It creates axis with incremental numeric number.
  • verify_integritybool, default False. When set to True, raises error for duplicate index.
  • sortbool, default False.

Alternatively, you can also use pandas.DataFrame.concat() to concatenate DataFrames which can also be used to append.

2. append() DataFrames Example

By default append() method appends rows and columns of the other pandas DataFrame at the end of the caller DataFrame. For example, In the below snippet appends rows of df1 towards end of the df and returns a new DataFrame.

When you have an additional column on any of the DataFrame, it appends the column with NaN on the result for rows the same column does not exist. Let’s create a pandas DataFrame from Dict to explore this with an example.


import pandas as pd

df = pd.DataFrame({'Courses': ["Spark","PySpark","Python","pandas"],
                    'Fee' : [20000,25000,22000,24000]})

df1 = pd.DataFrame({'Courses': ["Pandas","Hadoop","Hyperion","Java"],
                    'Fee': [25000,25200,24500,24900],
                    'Duration': ['30days','35days','40days','45days']})

# Using append() method
df2 = df.append(df1)
print(df2)

Yields below output.


    Courses    Fee Duration
0     Spark  20000      NaN
1   PySpark  25000      NaN
2    Python  22000      NaN
3    pandas  24000      NaN
0    Pandas  25000   30days
1    Hadoop  25200   35days
2  Hyperion  24500   40days
3      Java  24900   45days

Using this method you can also append list of rows to the DataFrame.

3. Reindex the DataFrame while Append

In the above result DataFrame, the index has duplicate values. you can set the new index on the pandas DataFrame while appending by using ignore_index=True param.


# Using append() with ignore_index
df2 = df.append(df1, ignore_index=True)
print(df2)

Yields below output.


    Courses    Fee Duration
0     Spark  20000      NaN
1   PySpark  25000      NaN
2    Python  22000      NaN
3    pandas  24000      NaN
4    Pandas  25000   30days
5    Hadoop  25200   35days
6  Hyperion  24500   40days
7      Java  24900   45days

5. Append Dict as Row to DataFrame

Sometimes you would be required to append a dict as a row to DataFrame. The below example demonstrates how to do this with example. First, create a Dict and add it to the df object.


# Append Dict as row to DataFrame
new_row = {'Courses':'Hyperion', 'Fee':24000}
df2=df.append(new_row, ignore_index=True)
print(df2)

Yields below output.


    Courses    Fee
0     Spark  20000
1   PySpark  25000
2    Python  22000
3    pandas  24000
4  Hyperion  24000

5. Append Multiple DataFrames

To append multiple pandas DataFrames pass the DataFrames you wanted to append as a list to the append() method. Use ingore_index=True param to reset the index on pandas DataFrame to start from zero.


# Create third DataFrame  
df2 = pd.DataFrame({'Courses':['PHP','GO'],
                    'Duration':['30day','40days'],
                    'Fee':[10000,23000]})
  
# Appending multiple DataFrame
df3 = df.append([df1, df2], ignore_index=True)
print(df3)

Yields below output


    Courses    Fee Duration
    Courses    Fee Duration
0     Spark  20000      NaN
1   PySpark  25000      NaN
2    Python  22000      NaN
3    pandas  24000      NaN
4    Pandas  25000   30days
5    Hadoop  25200   35days
6  Hyperion  24500   40days
7      Java  24900   45days
8       PHP  10000    30day
9        GO  23000   40days

6. Complete Example of pandas append()


import pandas as pd

df = pd.DataFrame({'Courses': ["Spark","PySpark","Python","pandas"],
                    'Fee' : [20000,25000,22000,24000]})

df1 = pd.DataFrame({'Courses': ["Pandas","Hadoop","Hyperion","Java"],
                    'Fee': [25000,25200,24500,24900],
                    'Duration': ['30days','35days','40days','45days']})

# Using append() method
df2 = df.append(df1)
print(df2)

# Using append() with ignore_index
df2 = df.append(df1, ignore_index=True)
print(df2)

# Create third DataFrame  
df2 = pd.DataFrame({'Courses':['PHP','GO'],
                    'Duration':['30day','40days'],
                    'Fee':[10000,23000]})
  
# Appending multiple DataFrame
df3 = df.append([df1, df2], ignore_index=True)
print(df3)

# Append Dict as row to DataFrame
new_row = {'Courses':'Hyperion', 'Fee':24000}
df2=df.append(new_row, ignore_index=True)
print(df2)

Conclusion

By using the append() method you can append one DataFrame with another by rows and columns. This method takes other (pass list for multiple dataframes), ignore_index, verify_integrity, sort as parameters, and returns a new DataFrame with the combined result. Note that when you have an additional column on any of the DataFrame, it appends the column with NaN on the result for rows the same column does not exist.

References

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing Pandas append() Usage by Examples