In pandas, the assign()
method is used to add new columns to a DataFrame or to modify existing ones. This method returns a new DataFrame with the updated columns, leaving the original DataFrame unchanged unless you reassign it.
In this article, I will explain the Pandas DataFrame assign()
function by using its syntax, parameters, usage, and how we can return a new DataFrame with the added or modified columns.
Key Points –
- The
assign()
method returns a new DataFrame with the updated columns, leaving the original DataFrame unchanged unless reassigned. - You can use
assign()
to add new columns to a DataFrame by specifying the column names and values. - Existing columns can be modified by using
assign()
with the column name and the new values. - It is possible to add or modify multiple columns in a single call to
assign()
by providing multiple keyword arguments. - The columns to be added or modified are specified using keyword arguments, where the key is the column name and the value is the column data.
Quick Examples of Pandas DataFrame assign()
If you are in a hurry, below are some quick examples of Pandas DataFrame assign() function.
# Quick examples of pandas DataFrame assign()
# Example 1: Add a new column 'Column3'
# Which is the sum of 'Column1' and 'Column2'
df2 = df.assign(Column3 = df['Column1'] + df['Column2'])
# Example 2: Add a new column 'Column3'
# Which is the square of column 'Column2'
df2 = df.assign(Column3 = lambda x: x['Column2'] ** 2)
# Example 3: Add a new column 'Column3'
# With a constant value (e.g., 10)
df2 = df.assign(Column3 = 10)
# Example 4: Add a new column 'Column3'
# Which is the product of 'Column1' and 'Column2'
df2 = df.assign(Column3=lambda x: x['Column1'] * x['Column2'])
Pandas DataFrame.assign() Introduction
Let’s know the syntax of the Pandas DataFrame.assign()
# Syntax of Pandas DataFrame.assign()
DataFrame.assign(**kwargs)
Parameters of the DataFrame.assign()
Following are the parameters of the DataFrame.assign() function.
kwargs
– keyword arguments where the key is the name of the new or existing column, and the value is the data for that column. This can be a scalar value, a Series, or a function.
Return Value
It returns a new DataFrame that includes the newly added columns.
Usage of Pandas DataFrame assign()
The assign()
method in pandas is used to add new columns to a DataFrame or modify existing columns by assigning computed results based on existing data. It returns a new DataFrame with the added columns, leaving the original DataFrame unchanged.
First, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are Column1
, and Column2
.
# Create DataFrame
import pandas as pd
import numpy as np
data = {'Column1': [5, 15, 8, 20, 25],
'Column2': [4, 12, 6, 17, 20]}
df = pd.DataFrame(data)
print("Create DataFrame:\n",df)
Yields below output.
Adding a New Column Based on Existing Columns
You can add a new column based on existing columns in a DataFrame using the assign()
method.
# Add a new column 'Column3'
# Which is the sum of 'Column1' and 'Column2'
df2 = df.assign(Column3 = df['Column1'] + df['Column2'])
print("Add a new column:\n",df2)
In the above example, a new column Column3
is added to the DataFrame df
, where each value in Column3
is the sum of the corresponding values in columns Column1
and Column2
. This is achieved by assigning the result of the addition operation df[Column1] + df[Column2]
to the new column Column3
using the assign()
method. The resulting DataFrame df2
contains the original columns Column1
‘ and Column2
along with the newly added column Column3
.
Adding a New Column Using a Lambda Function
Alternatively, adding a new column to a DataFrame using a lambda function with the assign()
method in pandas allows for flexibility in defining column values based on existing data.
# Add a new column 'Column3' which is the square of column 'Column2'
df2 = df.assign(Column3 = lambda x: x['Column2'] ** 2)
print(df2)
# Output:
# Column1 Column2 Column3
# 0 5 4 16
# 1 15 12 144
# 2 8 6 36
# 3 20 17 289
# 4 25 20 400
In the above example, a new column Column3
is added to the DataFrame df
, where each value in Column3
is the square of the corresponding value in column Column2
. This is achieved by using a lambda function within the assign()
method. The lambda function takes the DataFrame x
as input and returns the squared value of column Column2
. The resulting DataFrame df2
contains the original columns Column1
and Column2
along with the newly added column Column3
.
Adding a New Column with a Constant Value
Adding a new column with a constant value to a DataFrame using the assign()
method. For instance, a new column Column3
is added to the DataFrame df
with a constant value of 10 using the assign()
method. This creates a new DataFrame df2
with the added column.
# Add a new column 'Column3'
# With a constant value (e.g., 10)
df2 = df.assign(Column3 = 10)
print(df2)
# Output:
# Column1 Column2 Column3
# 0 5 4 10
# 1 15 12 10
# 2 8 6 10
# 3 20 17 10
# 4 25 20 10
Adding a New Column Based on Multiple Columns
Similarly, if you want to add multiple columns to a DataFrame at once, you can use the assign()
method. This method allows you to create new columns or update existing ones using various calculations based on existing columns.
# Add a new column 'Column3'
# Which is the product of 'Column1' and 'Column2'
df2 = df.assign(Column3=lambda x: x['Column1'] * x['Column2'])
print("Add a new column 'Column3':\n", df2)
# Output:
# Add a new column 'Column3':
# Column1 Column2 Column3
# 0 5 4 20
# 1 15 12 180
# 2 8 6 48
# 3 20 17 340
# 4 25 20 500
In the above example, a DataFrame df
is created with two columns, Column1
and Column2
. Subsequently, a new column, Column3
, is added to df
. The values in Column3
are calculated as the product of the corresponding values in Column1
and Column2
. This is achieved using the assign()
method with a lambda function, lambda x: x['Column1'] * x['Column2']
.
Frequently Asked Questions on Pandas DataFrame assign()
The assign()
method in pandas is used to add new columns to a DataFrame or modify existing ones. It returns a new DataFrame with the changes, leaving the original DataFrame unchanged.
You can use the assign()
method by passing new column names as keyword arguments, with their values set to the desired expressions or functions.
The assign()
method does not modify the original DataFrame. Instead, it returns a new DataFrame with the changes.
The main limitation of the assign()
method is that it creates a new DataFrame rather than modifying the existing one. This can be less efficient in terms of memory usage for very large DataFrames.
You can add multiple columns at once to a DataFrame using the assign()
method in pandas. The assign()
method allows you to specify multiple new columns as keyword arguments, each with its own calculation or transformation based on existing columns or other criteria.
Conclusion
In this article, you have learned the Pandas DataFrame assign()
function by using its syntax, parameters, usage, and how we can return a new DataFrame with the added columns, this function does not change the original DataFrame.
Happy Learning!!
Related Articles
- Pandas Add Column to DataFrame
- Pandas DataFrame sample() Function
- Pandas DataFrame describe() Method
- Pandas DataFrame equals() Method
- Pandas DataFrame corrwith() Method
- Pandas DataFrame product() Method
- Pandas DataFrame mode() Method
- Pandas Add Constant Column to DataFrame
- Pandas Create Conditional Column in DataFrame
- Add an Empty Column to a Pandas DataFrame
- Pandas apply() Function to Single & Multiple Column(s)
- Pandas Create New DataFrame By Selecting Specific Columns