• Post author:
  • Post category:Pandas
  • Post last modified:August 6, 2024
  • Reading time:15 mins read
You are currently viewing Pandas DataFrame assign() Method

In pandas, the assign() method is used to add new columns to a DataFrame or to modify existing ones. This method returns a new DataFrame with the updated columns, leaving the original DataFrame unchanged unless you reassign it.

Advertisements

In this article, I will explain the Pandas DataFrame assign() function by using its syntax, parameters, usage, and how we can return a new DataFrame with the added or modified columns.

Key Points –

  • The assign() method returns a new DataFrame with the updated columns, leaving the original DataFrame unchanged unless reassigned.
  • You can use assign() to add new columns to a DataFrame by specifying the column names and values.
  • Existing columns can be modified by using assign() with the column name and the new values.
  • It is possible to add or modify multiple columns in a single call to assign() by providing multiple keyword arguments.
  • The columns to be added or modified are specified using keyword arguments, where the key is the column name and the value is the column data.

Quick Examples of Pandas DataFrame assign()

If you are in a hurry, below are some quick examples of Pandas DataFrame assign() function.


# Quick examples of pandas DataFrame assign()

# Example 1: Add a new column 'Column3' 
# Which is the sum of 'Column1' and 'Column2'
df2 = df.assign(Column3 = df['Column1'] + df['Column2'])

# Example 2: Add a new column 'Column3' 
# Which is the square of column 'Column2'
df2 = df.assign(Column3 = lambda x: x['Column2'] ** 2)

# Example 3: Add a new column 'Column3' 
# With a constant value (e.g., 10)
df2 = df.assign(Column3 = 10)

# Example 4: Add a new column 'Column3' 
# Which is the product of 'Column1' and 'Column2'
df2 = df.assign(Column3=lambda x: x['Column1'] * x['Column2'])

Pandas DataFrame.assign() Introduction

Let’s know the syntax of the Pandas DataFrame.assign()


# Syntax of Pandas DataFrame.assign()
DataFrame.assign(**kwargs)

Parameters of the DataFrame.assign()

Following are the parameters of the DataFrame.assign() function.

  • kwargs – keyword arguments where the key is the name of the new or existing column, and the value is the data for that column. This can be a scalar value, a Series, or a function.

Return Value

It returns a new DataFrame that includes the newly added columns.

Usage of Pandas DataFrame assign()

The assign() method in pandas is used to add new columns to a DataFrame or modify existing columns by assigning computed results based on existing data. It returns a new DataFrame with the added columns, leaving the original DataFrame unchanged.

First, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are Column1, and Column2.


# Create DataFrame 
import pandas as pd
import numpy as np

data = {'Column1': [5, 15, 8, 20, 25],
        'Column2': [4, 12, 6, 17, 20]}
df = pd.DataFrame(data)
print("Create DataFrame:\n",df)

Yields below output.

pandas dataframe assign

Adding a New Column Based on Existing Columns

You can add a new column based on existing columns in a DataFrame using the assign() method.


# Add a new column 'Column3' 
# Which is the sum of 'Column1' and 'Column2'
df2 = df.assign(Column3 = df['Column1'] + df['Column2'])
print("Add a new column:\n",df2)

In the above example, a new column Column3 is added to the DataFrame df, where each value in Column3 is the sum of the corresponding values in columns Column1 and Column2. This is achieved by assigning the result of the addition operation df[Column1] + df[Column2] to the new column Column3 using the assign() method. The resulting DataFrame df2 contains the original columns Column1‘ and Column2 along with the newly added column Column3.

pandas dataframe assign

Adding a New Column Using a Lambda Function

Alternatively, adding a new column to a DataFrame using a lambda function with the assign() method in pandas allows for flexibility in defining column values based on existing data.


# Add a new column 'Column3' which is the square of column 'Column2'
df2 = df.assign(Column3 = lambda x: x['Column2'] ** 2)
print(df2)

# Output:
#    Column1  Column2  Column3
# 0        5        4       16
# 1       15       12      144
# 2        8        6       36
# 3       20       17      289
# 4       25       20      400

In the above example, a new column Column3 is added to the DataFrame df, where each value in Column3 is the square of the corresponding value in column Column2. This is achieved by using a lambda function within the assign() method. The lambda function takes the DataFrame x as input and returns the squared value of column Column2. The resulting DataFrame df2 contains the original columns Column1 and Column2 along with the newly added column Column3.

Adding a New Column with a Constant Value

Adding a new column with a constant value to a DataFrame using the assign() method. For instance, a new column Column3 is added to the DataFrame df with a constant value of 10 using the assign() method. This creates a new DataFrame df2 with the added column.


# Add a new column 'Column3' 
# With a constant value (e.g., 10)
df2 = df.assign(Column3 = 10)
print(df2)

# Output:
#    Column1  Column2  Column3
# 0        5        4       10
# 1       15       12       10
# 2        8        6       10
# 3       20       17       10
# 4       25       20       10

Adding a New Column Based on Multiple Columns

Similarly, if you want to add multiple columns to a DataFrame at once, you can use the assign() method. This method allows you to create new columns or update existing ones using various calculations based on existing columns.


# Add a new column 'Column3' 
# Which is the product of 'Column1' and 'Column2'
df2 = df.assign(Column3=lambda x: x['Column1'] * x['Column2'])
print("Add a new column 'Column3':\n", df2)

# Output:
# Add a new column 'Column3':
#     Column1  Column2  Column3
# 0        5        4       20
# 1       15       12      180
# 2        8        6       48
# 3       20       17      340
# 4       25       20      500

In the above example, a DataFrame df is created with two columns, Column1 and Column2. Subsequently, a new column, Column3, is added to df. The values in Column3 are calculated as the product of the corresponding values in Column1 and Column2. This is achieved using the assign() method with a lambda function, lambda x: x['Column1'] * x['Column2'].

Frequently Asked Questions on Pandas DataFrame assign()

What is the assign() method in pandas?

The assign() method in pandas is used to add new columns to a DataFrame or modify existing ones. It returns a new DataFrame with the changes, leaving the original DataFrame unchanged.

How do you use the assign() method?

You can use the assign() method by passing new column names as keyword arguments, with their values set to the desired expressions or functions.

Does the assign() method modify the original DataFrame?

The assign() method does not modify the original DataFrame. Instead, it returns a new DataFrame with the changes.

Are there any limitations to the assign() method?

The main limitation of the assign() method is that it creates a new DataFrame rather than modifying the existing one. This can be less efficient in terms of memory usage for very large DataFrames.

Can you add multiple columns at once using assign()?

You can add multiple columns at once to a DataFrame using the assign() method in pandas. The assign() method allows you to specify multiple new columns as keyword arguments, each with its own calculation or transformation based on existing columns or other criteria.

Conclusion

In this article, you have learned the Pandas DataFrame assign() function by using its syntax, parameters, usage, and how we can return a new DataFrame with the added columns, this function does not change the original DataFrame.

Happy Learning!!

Reference