• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:16 mins read
You are currently viewing Pandas Add Column to DataFrame

In pandas you can add/append a new column to the existing DataFrame using DataFrame.insert() method, this method updates the existing DataFrame with a new column. DataFrame.assign() is also used to insert a new column however, this method returns a new Dataframe after adding a new column.

In this article, I will cover examples of how to add/append multiple columns, add a constant value, deriving new columns from an existing column to the Pandas DataFrame.

1. Quick Examples of Add Column to DataFrame


# Below are quick examples

# Add new column to the DataFrame
tutors = ['William', 'Henry', 'Michael', 'John', 'Messi']
df2 = df.assign(TutorsAssigned=tutors)

# Add a multiple columns to the DataFrame
MNCCompanies = ['TATA','HCL','Infosys','Google','Amazon']
df2 =df.assign(MNCComp = MNCCompanies,TutorsAssigned=tutors )

# Derive New Column from Existing Column
df = pd.DataFrame(technologies)
df2=df.assign(Discount_Percent=lambda x: x.Fee * x.Discount / 100)

# Add a constant or empty value to the DataFrame.
df = pd.DataFrame(technologies)
df2=df.assign(A=None,B=0,C="")

# Add New column to the existing DataFrame
df = pd.DataFrame(technologies)
df["MNCCompanies"] = MNCCompanies

# Add new column at the specific position
df = pd.DataFrame(technologies)
df.insert(0,'Tutors', tutors )

# Add new column by mapping to the existing column
df = pd.DataFrame(technologies)
tutors = {"Spark":"William", "PySpark":"Henry", "Hadoop":"Michael","Python":"John", "pandas":"Messi"}
df['Tutors'] = df['Courses'].map(tutors)
print(df)

Let’s create a Pandas DataFrame with sample data and execute the above examples.


# Create DataFrame
import pandas as pd
import numpy as np

technologies= {
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Discount':[1000,2300,1000,1200,2500]
          }

df = pd.DataFrame(technologies)
print("Create a DataFrame:\n", df)

Yields below output.

Pandas Add Column DataFrame

2. Pandas Add Column to DataFrame

DataFrame.assign() is used to add/append a column to the Pandas DataFrame, this method returns a new DataFrame after adding a column to the existing DataFrame.

Below is the syntax of the assign() method.


# Syntax of DataFrame.assign()
DataFrame.assign(**kwargs)

Now let’s add a column ‘TutorsAssigned” to the DataFrame. Using assign() we cannot modify the existing DataFrame inplace instead it returns a new DataFrame after adding a column. The below example adds a list of values as a new column to the DataFrame.


# Add new column to the DataFrame
tutors = ['William', 'Henry', 'Michael', 'John', 'Messi']
df2 = df.assign(TutorsAssigned=tutors)
print("Add column to DataFrame:\n", df2)

Yields below output.

Pandas Add Column DataFrame

3. Add Multiple Columns to the DataFrame

You can also use assign() method to add multiple columns to the Pandas DataFrame.


# Add multiple columns to the DataFrame
MNCCompanies = ['TATA','HCL','Infosys','Google','Amazon']
df2 = df.assign(MNCComp = MNCCompanies,TutorsAssigned=tutors )
print("Add multiple columns to DataFrame:\n", df2)

Yields below output.


# Output:
# Add multiple columns to DataFrame:
    Courses    Fee  Discount  MNCComp TutorsAssigned
0    Spark  22000      1000     TATA        William
1  PySpark  25000      2300      HCL          Henry
2   Hadoop  23000      1000  Infosys        Michael
3   Python  24000      1200   Google           John
4   Pandas  26000      2500   Amazon          Messi

4. Adding a Column From Existing

In real-time, we are mostly required to add a column by calculating from an existing column. The below example derives Discount_Percent column from Fee and Discount. Here, I will use lambda to derive a new column from the existing one.


# Derive New Column from Existing Column
df = pd.DataFrame(technologies)
df2 = df.assign(Discount_Percent=lambda x: x.Fee * x.Discount / 100)
print("Add column to DataFrame:\n", df2)

Yields below output. Similarly, you can also derive multiple columns and add them to a DataFrame in a single statement, I will leave this to you to explore.


# Output:
# Add column to DataFrame:
   Courses    Fee  Discount  Discount_Percent
0    Spark  22000      1000          220000.0
1  PySpark  25000      2300          575000.0
2   Hadoop  23000      1000          230000.0
3   Python  24000      1200          288000.0
4   Pandas  26000      2500          650000.0

5. Add a Constant or Empty Column

The below example adds 3 new columns to the DataFrame, one column with all None values, a second column with 0 value, and the third column with an empty string value.


# Add a constant or empty value to the DataFrame.
df = pd.DataFrame(technologies)
df2=df.assign(A=None,B=0,C="")
print("Add column to DataFrame:\n", df2)

Yields below output.


# Output:
# Add column to DataFrame:
    Courses    Fee  Discount     A  B C
0    Spark  22000      1000  None  0  
1  PySpark  25000      2300  None  0  
2   Hadoop  23000      1000  None  0  
3   Python  24000      1200  None  0  
4   Pandas  26000      2500  None  0  

6. Append Column to Existing Pandas DataFrame

The above examples create a new DataFrame after adding new columns instead of appending a column to an existing DataFrame. The example explained in this section is used to append a new column to the existing DataFrame.


# Add New column to the existing DataFrame
df = pd.DataFrame(technologies)
df["MNCCompanies"] = MNCCompanies
print("Add column to DataFrame:\n", df2)

Yields below output.


# Output:
# Add column to DataFrame:
   Courses    Fee  Discount MNCCompanies
0    Spark  22000      1000         TATA
1  PySpark  25000      2300          HCL
2   Hadoop  23000      1000      Infosys
3   Python  24000      1200       Google
4   Pandas  26000      2500       Amazon

You can also use this approach to add a new column by deriving from an existing column,


# Derive a new column from existing column
df2 = df['Discount_Percent'] = df['Fee'] * df['Discount'] / 100
print("Add column to DataFrame:\n", df2)

# Output:
# Add column to DataFrame:
#  0    220000.0
# 1    575000.0
# 2    230000.0
# 3    288000.0
# 4    650000.0
dtype: float64

7. Add Column to Specific Position of DataFrame

DataFrame.insert() method is used to add DataFrame at any position of the existing DataFrame. In most of the above examples you have seen inserts at the end of the DataFrame but this method gives the flexibility to add it at the beginning, in the middle, or at any column index of the DataFrame.

This example adds a Tutors column at the beginning of the DataFrame.

#
# Add new column at the specific position
df = pd.DataFrame(technologies)
df.insert(0,'Tutors', tutors )
print("Add column to DataFrame:\n", df)

Yields below output.


# Output:
# Add column to DataFrame:
    Tutors  Courses    Fee  Discount
0  William    Spark  22000      1000
1    Henry  PySpark  25000      2300
2  Michael   Hadoop  23000      1000
3     John   Python  24000      1200
4    Messi   Pandas  26000      2500

8. Add a Column From Dictionary Mapping

If you want to add a column with specific values for each row based on an existing value, you can do this using a Dictionary. Here, The values from the dictionary will be added as Tutors column in df, by matching the key value with the column 'Courses'.


# Add new column by mapping to the existing column
df = pd.DataFrame(technologies)
tutors = {"Spark":"William", "PySpark":"Henry", "Hadoop":"Michael","Python":"John", "pandas":"Messi"}
df['Tutors'] = df['Courses'].map(tutors)
print("Add column to DataFrame:\n", df)

Yields below output. Note that it is unable to map pandas as the key in the dictionary is not exactly matched with the value in the Courses column (case sensitive).


# Output:
# Add column to DataFrame:
   Courses    Fee  Discount   Tutors
0    Spark  22000      1000  William
1  PySpark  25000      2300    Henry
2   Hadoop  23000      1000  Michael
3   Python  24000      1200     John
4   Pandas  26000      2500      NaN

9. Using loc[] Add Column

Using pandas loc[] you can access rows and columns by labels or names however, you can also use this for adding a new column to pandas DataFrame. This loc[] property uses the first argument as rows and the second argument for columns hence, I will use the second argument to add a new column.


# Assign the column to the DataFrame
df = pd.DataFrame(technologies)
tutors = ['William', 'Henry', 'Michael', 'John', 'Messi']
df.loc[:, 'Tutors'] = tutors
print("Add column to DataFrame:\n", df)

Output the same as the above.

Frequently Asked Questions on add column to the Pandas DataFrame

How do I add a new column to an existing DataFrame?

You can add a new column to a DataFrame by simply assigning values to a new column name. df[new_column] = ['col_value1', 'col_value2', 'col_value3']

How do I add a column based on calculations from existing columns?

You can perform calculations based on existing columns to create a new column. For example, df['new_col'] = df['existing_col_value'] * 2

How can I add a column at a specific position in the DataFrame?

You can use the insert method to add a column at a specific position. For example, adding a ‘Gender’ column at the second position. For example, df.insert(1, 'new_col', ['col_value1', 'col_value12, 'col_value3'])

How can I add a column using data from another DataFrame?

you can add a column using data from another DataFrame if the indices match. For example, df2 = pd.DataFrame({‘existing_column’: [value1, value2, value3]}) df[‘existing_column’] = df2[‘existing_column’]

Conclusion

In this article, I have explained you can add/append a column to the existing DataFrame by using DataFrame.assing(), DataFrame.insert() e.t.c. Also learned insert() is used to add a column at any position of the DataFrame.

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

This Post Has 2 Comments

  1. NNK

    Thank you for pointing it out. I have fixed it now

  2. Anonymous

    Add Column From Dictionary Mapping:
    your last example will not work as described in this article. The KEYS from the dictionary will be added as another COLUMN values in df, regardless of the dictionaly VALUES.

Comments are closed.