Pandas – Add New Column to Existing DataFrame

In pandas you can add a new column to the existing DataFrame using DataFrame.insert() method, this method updates the existing DataFrame with a new column. DataFrame.assign() is also used to insert a new column however, this method returns a new Dataframe after adding a new column.

In this article I will cover examples of how to add multiple columns, adding a constant value, deriving new columns from an existing column,s and adding a constant value to the Pandas DataFrame

1. Create a Sample Pandas DataFrame

Let’s create a Pandas DataFrame with sample data and it contains columns Courses, Fee, Discount.


import pandas as pd
import numpy as np

technologies= {
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Discount':[1000,2300,1000,1200,2500]
          }

df = pd.DataFrame(technologies)
print(df)

Yields below output.


   Courses    Fee  Discount
0    Spark  22000      1000
1  PySpark  25000      2300
2   Hadoop  23000      1000
3   Python  24000      1200
4   Pandas  26000      2500

2. Pandas Add New Column Using DataFrame.assign() Method

DataFrame.assign() is used to add a new column to the Pandas DataFrame, this method returns a new DataFrame after adding a new column to the existing DataFrame.

Below is the syntax of the assign() method.


# Syntax of DataFrame.assign()
DataFrame.assign(**kwargs)

Now let’s add a new column ‘TutorsAssigned” to the DataFrame. Using assign() we cannot modify the existing DataFrame in-place instead it returns a new DataFrame after adding a column.


# Add new column to the DataFrame
tutors = ['William', 'Henry', 'Michael', 'John', 'Messi']
df2 = df.assign(TutorsAssigned=tutors)
print(df2)

Yields below output.


   Courses    Fee  Discount TutorsAssigned
0    Spark  22000      1000        William
1  PySpark  25000      2300          Henry
2   Hadoop  23000      1000        Michael
3   Python  24000      1200           John
4   Pandas  26000      2500          Messi

3. Add Multiple Columns to the Pandas DataFrame

You can also use assign() method to add multiple columns to the pandas DataFrame


# Add a multiple columns to the DataFrame
MNCCompanies = ['TATA','HCL','Infosys','Google','Amazon']
df2 =df.assign(MNCComp = MNCCompanies,TutorsAssigned=tutors )

4. Adding a New Column From Existing Column of DataFrame

In real-time, we are mostly required to add a new column by calculating from an existing column. The below example derives Discount_Percent column from Fee and Discount.


# Derive New Column from Existing Column
df = pd.DataFrame(technologies)
df2=df.assign(Discount_Percent=lambda x: x.Fee * x.Discount / 100)
print(df2)

Yields below output. similarly, you can also derive multiple columns and add them to a DataFrame in a single statement.


   Courses    Fee  Discount  Discount_Percent
0    Spark  22000      1000          220000.0
1  PySpark  25000      2300          575000.0
2   Hadoop  23000      1000          230000.0
3   Python  24000      1200          288000.0
4   Pandas  26000      2500          650000.0

5. Add a Constant or Empty Column to DataFrame

The below example adds 3 new columns to the DataFrame, one column with all None values, a second column with 0 value, and the third column with an empty string value.


# Add a constant or empty value to the DataFrame.
df = pd.DataFrame(technologies)
df2=df.assign(A=None,B=0,C="")
print(df2)

6. Add New Column to Existing Pandas DataFrame

The above examples create a new DataFrame instead of adding to an existing DataFrame, Example explained in this section is used to add a new column to the existing DataFrame.


# Add New column to the existing DataFrame
df = pd.DataFrame(technologies)
df["MNCCompanies"] = MNCCompanies
print(df)

Yields below output.


   Courses    Fee  Discount MNCCompanies
0    Spark  22000      1000         TATA
1  PySpark  25000      2300          HCL
2   Hadoop  23000      1000      Infosys
3   Python  24000      1200       Google
4   Pandas  26000      2500       Amazon

You can also use this approach to add a new column by deriving from an existing column,


# Derive a new column from existing column
df['Discount_Percent'] = df['Fee'] * df['Discount'] / 100

7. Adding a New Column at a Specific Position Using DataFrame.insert() Method

DataFrame.insert() method is used to add a new DataFrame at any position of the existing DataFrame. In most of the above examples you have seen inserts at the end of the DataFrame but this method gives the flexibility to add it at the beginning, in the middle, or any column index of the DataFrame.

This example adds a Tutors column at the beginning of the DataFrame.


# Add new column at the specific position
df = pd.DataFrame(technologies)
df.insert(0,'Tutors', tutors )
print(df)

Yields below output.


    Tutors  Courses    Fee  Discount
0  William    Spark  22000      1000
1    Henry  PySpark  25000      2300
2  Michael   Hadoop  23000      1000
3     John   Python  24000      1200
4    Messi   Pandas  26000      2500

8. Add a new Column From Dictionary Mapping

If you wanted to add a new column with specific values for each row based on an existing value, you can do this using a Dictionary.


# Add new column by mapping to the existing column
df = pd.DataFrame(technologies)
tutors = {"William":"Spark", "Henry":"PySpark", "Michael":"Hadoop","John":"Python", "Messi":"pandas"}
df['Tutors'] = tutors
print(df)

Yields below output.


   Courses    Fee  Discount   Tutors
0    Spark  22000      1000  William
1  PySpark  25000      2300    Henry
2   Hadoop  23000      1000  Michael
3   Python  24000      1200     John
4   Pandas  26000      2500    Messi

Conclusion

In this article, I have explained you can add a new column to the existing DataFrame by using DataFrame.assing(), DataFrame.insert() e.t.c. Also learned insert() is used to add a column at any position of the DataFrame.

You May Also Like

References

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

Pandas – Add New Column to Existing DataFrame