• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:15 mins read
You are currently viewing Pandas Add Column with Default Value

In pandas, you can add a column with a default value to the existing DataFrame by using df[], assign(), and insert() functions. DataFrame.assign() returns a new Dataframe after adding a column with default values to the existing DataFrame. Use Dataframe.insert() function to insert a column on the existing DataFrame with default values. In this article, I will explain how to add a column with default value in pandas DataFrame with examples.

Key Points –

  • Specify the default value directly within the DataFrame constructor or when using methods like DataFrame.insert() or DataFrame.assign().
  • Ensure the default value aligns with the data type of the column.
  • Use the assignment operator (=) to create a new column and assign the default value.
  • Utilize the .loc indexer to assign values to the new column based on conditions or criteria.
  • Consider efficiency when adding columns with default values, especially for large datasets, by optimizing code execution.

1. Quick Examples of Add Column with Default Value

If you are in a hurry, below are some quick examples of adding a column with a default value on DataFrame.


# Quick examples of add column with default Value

# Examples 1: use DataFrame.assign() function
df2 = df.assign(Tutors = ['William', 'Henry', 'Michael', 'John'])

# Examples 2: Add new column to the DataFrame
tutors = ['William', 'Henry', 'Michael', 'John']
df2 = df.assign(Tutors=tutors)

# Examples 3: Add new column with default value 
# using DataFrame.assign() function
df2 = df.assign(Tutors='NAN')

# Examples 4: Use df[] operator
df['Percent'] = ['5%','10%','15%','20%']

# Examples 5: Add new column with default value 
# Using df[ ] operator
df['Percent'] = 'NAN'

# Examples 6: Add column with default value 
# Using DataFrame.insert() function
df.insert(4, "Percent", "10%", allow_duplicates=False)

Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are CoursesFeeDuration and Discount


import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
    'Discount':[1000,2300,1200,2000]
              }
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

Yields below output.


# Output:
    Courses    Fee Duration  Discount
r1    Spark  20000   30days      1000
r2  PySpark  25000   40days      2300
r3   Python  22000   35days      1200
r4   pandas  30000   50days      2000

2. Add Column with Default Value Using DataFrame.assign()

DataFrame.assign() function is used to add a column with the default value to the Pandas DataFrame, this function returns a DataFrame after adding a column to the existing DataFrame.

Below is the syntax of the assign() function.


# Syntax of DataFrame.assign()
DataFrame.assign(**kwargs)

Let’s add a column "Tutors” to the DataFrame with the default value NaN. Using assign() you cannot modify the existing DataFrame in place instead it returns a DataFrame after adding a column.


# Add new column with default value 
# using DataFrame.assign() function
df2 = df.assign(Tutors='NAN')
print(df2)

Yields below output.


# Output:
    Courses    Fee Duration  Discount Tutors
r1    Spark  20000   30days      1000    NAN
r2  PySpark  25000   40days      2300    NAN
r3   Python  22000   35days      1200    NAN
r4   pandas  30000   50days      2000    NAN

3. Add New Column with Default Value Using df[ ] Operator

Using df[] operator, you can add a column with a default value to Pandas DataFrame. This is the best example when you want to add a new column to DataFrame.

Below is the syntax of the df[] operator.


# Syntax of df[] operator
df[col_name]=value

Let’s add a column "Percent" as a list and pass them into df[] operator which will add a column with a default values to the given DataFrame.


# Use df[] operator
df['Percent'] = ['5%','10%','15%','20%']
print(df)

Yields below output.


# Output:
    Courses    Fee Duration  Discount Percent
r1    Spark  20000   30days      1000      5%
r2  PySpark  25000   40days      2300     10%
r3   Python  22000   35days      1200     15%
r4   pandas  30000   50days      2000     20%

Similarly, you can use df[] operator to add column with the same value to all rows of the existing DataFrame.


# Add new column with default value 
# Using df[ ] operator
df['Percent'] = 'NAN'
print(df)

Yields below output.


# Output:
    Courses    Fee Duration  Discount Percent
r1    Spark  20000   30days      1000     NAN
r2  PySpark  25000   40days      2300     NAN
r3   Python  22000   35days      1200     NAN
r4   pandas  30000   50days      2000     NAN

4. Add Column with Default Value Using DataFrame.insert()

DataFrame.insert() function you can insert a column with a default value to Pandas DataFrame at any position. Using this function you can specify the index where you would like to add a column with a default value.


# Add column with default value 
# using DataFrame.insert() function
df.insert(4, "Percent", "10%", allow_duplicates=False)
print(df)

Yields below output.


# Output:
    Courses    Fee Duration  Discount Percent
r1    Spark  20000   30days      1000     10%
r2  PySpark  25000   40days      2300     10%
r3   Python  22000   35days      1200     10%
r4   pandas  30000   50days      2000     10%

5. Complete Example For Add Column with Default Value


import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
    'Discount':[1000,2300,1200,2000]
              }
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

# use DataFrame.assign() function
df2 = df.assign(Tutors = ['William', 'Henry', 'Michael', 'John'])
print(df2)

# Add new column to the DataFrame
tutors = ['William', 'Henry', 'Michael', 'John']
df2 = df.assign(Tutors=tutors)
print(df2)

# Add new column with default value 
# using DataFrame.assign() function
df2 = df.assign(Tutors='NAN')
print(df2)

# Use df[] operator
df['Percent'] = ['5%','10%','15%','20%']
print(df)

# Add new column with default value 
# Using df[ ] operator
df['Percent'] = 'NAN'
print(df)

# Add column with default value 
# using DataFrame.insert() function
df.insert(4, "Percent", "10%", allow_duplicates=False)
print(df)

Frequently Asked Questions on Add Column with Default Value

Why should I add a column with a default value in Pandas?

Adding a default value allows you to initialize new columns with a predefined value, ensuring consistency and facilitating further data manipulation and analysis.

How can I add a column with a default value in Pandas?

Add a column with a default value by directly assigning the default value to the new column, either during DataFrame creation or by using methods like DataFrame.insert() or DataFrame.assign().

Can I specify different default values for different rows in Pandas?

You can use conditional statements with methods like .loc[] or .apply() to set different default values based on specific conditions or criteria for each row.

Can I change the default value of a column after adding it to a DataFrame?

You can modify the default value of a column by reassigning values to the column using standard DataFrame assignment operations or by applying functions to update the values based on specific conditions.

What data types are suitable for default values in Pandas?

Default values should be compatible with the data type of the column. Common data types include integers, floats, strings, booleans, and datetime objects, among others. Ensure consistency between the default value and the expected data type of the column.

Conclusion

In this article, I have explained how to add column with a default value to the existing Pandas DataFrame by using df[], DataFrame.assing(), and DataFrame.insert() e.t.c. Also learned insert() is used to insert a column with a default value at any position of the DataFrame.

Happy Learning !!

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium