In Polars, the with_columns()
function is used to add new columns, modify existing ones, or transform columns within a DataFrame. It provides a fast, vectorized way to apply multiple column operations simultaneously. You can pass either a list of expressions or keyword arguments, where each expression defines how to compute the new or updated columns based on existing data or constant values.
In this article, I will explain the Polars DataFrame with_columns()
function, covering its syntax, parameters, and usage to create a new DataFrame with added columns while keeping the original DataFrame unchanged.
Key Points –
with_columns()
is used to add new columns or modify existing columns in a Polars DataFrame.- Expressions typically use Polars syntax such as
pl.col()
,pl.lit()
, and arithmetic operations. - You can pass expressions either as positional arguments (a list of expressions) or as keyword arguments for named columns
- Using keyword arguments lets you name new or replaced columns directly without needing
alias()
. - You can add constant columns by using
pl.lit()
withinwith_columns()
. - Each expression can reference existing columns using
pl.col()
and perform computations or transformations.
Polars DataFrame with_columns() Introduction
Let’s know the syntax of the DataFrame with_columns() function
# Syntax of DataFrame with_columns()
DataFrame.with_columns(
*exprs: IntoExpr | Iterable[IntoExpr],
**named_exprs: IntoExpr,
) → DataFrame
Parameters of the DataFrame with_columns()
Following are the parameters of the DataFrame with_columns() function.
*exprs
– One or more expressions (or a list of expressions) that create or modify columns.**named_exprs
– Optional keyword arguments to define new column names directly with expressions.
Return Value
This function returns a new Polars DataFrame with the specified added or modified columns.
Usage of Polars DataFrame with_columns() Function
The with_columns()
method in Polars allows you to add, update, or replace multiple columns in a DataFrame at once, in a fast and efficient way. You pass a list of expressions that specify the columns to be created or changed, and it returns a new DataFrame reflecting those updates, leaving the original DataFrame unchanged.
Now, let’s create a Polars DataFrame.
import polars as pl
# Creating a new Polars DataFrame
technologies = {
'Courses': ["Spark", "Hadoop", "Hyperion", "Pandas"],
'Fees': [20000, 25000, 30000, 40000],
'Duration': ['30days', '50days', '40days', '60days'],
'Discount': [1000, 1500, 1200, 2500]
}
df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)
Yields below output.
You can use the with_columns()
method in Polars to add a new column to a DataFrame. For instance, here’s how to add a new column called "Trainer"
to an existing DataFrame.
# Add new column
df2 = df.with_columns([
pl.Series("Trainer", ["John", "Steve", "Jeff", "Ravi"])
])
print("DataFrame after adding Trainer column:\n", df2)
Yields below output.
Modify an Existing Column
You can also modify an existing column in Polars using the with_columns()
method by providing a new expression for that column’s name. If you add a column with the same name as an existing one, Polars will overwrite the original column with the new values from your expression.
# Apply a 10% discount on Fees
df2 = df.with_columns(
(pl.col("Fees") * 0.9).alias("Fees") # Multiply Fees by 0.9 and overwrite 'Fees'
)
print("DataFrame after modifying Fees column:\n", df2)
# Output:
# DataFrame after modifying Fees column:
# shape: (4, 4)
┌──────────┬─────────┬──────────┬──────────┐
│ Courses ┆ Fees ┆ Duration ┆ Discount │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ str ┆ i64 │
╞══════════╪═════════╪══════════╪══════════╡
│ Spark ┆ 18000.0 ┆ 30days ┆ 1000 │
│ Hadoop ┆ 22500.0 ┆ 50days ┆ 1500 │
│ Hyperion ┆ 27000.0 ┆ 40days ┆ 1200 │
│ Pandas ┆ 36000.0 ┆ 60days ┆ 2500 │
└──────────┴─────────┴──────────┴──────────┘
Here,
- Using
with_columns()
with an existing column name updates (modifies) that column. - This returns a new DataFrame with the modification.
- The original DataFrame stays unchanged unless you assign back to the same variable.
Use Keyword Arguments for New Columns
Using keyword arguments in with_columns()
is a neat and readable way to add or modify columns by specifying the new column names directly as parameter names.
df2 = df.with_columns(
Final_Fee = pl.col("Fees") - pl.col("Discount"),
Fees_Doubled = pl.col("Fees") * 2
)
print("DataFrame after modifications:\n", df2)
# Output:
# DataFrame after modifications:
# shape: (4, 6)
┌──────────┬───────┬──────────┬──────────┬───────────┬──────────────┐
│ Courses ┆ Fees ┆ Duration ┆ Discount ┆ Final_Fee ┆ Fees_Doubled │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 ┆ i64 ┆ i64 │
╞══════════╪═══════╪══════════╪══════════╪═══════════╪══════════════╡
│ Spark ┆ 20000 ┆ 30days ┆ 1000 ┆ 19000 ┆ 40000 │
│ Hadoop ┆ 25000 ┆ 50days ┆ 1500 ┆ 23500 ┆ 50000 │
│ Hyperion ┆ 30000 ┆ 40days ┆ 1200 ┆ 28800 ┆ 60000 │
│ Pandas ┆ 40000 ┆ 60days ┆ 2500 ┆ 37500 ┆ 80000 │
└──────────┴───────┴──────────┴──────────┴───────────┴──────────────┘
Here,
- Use keyword arguments to directly name new or modified columns in
with_columns()
. - No need for
alias()
because the name comes from the argument name.
You can also add or modify columns in Polars using with_columns()
by passing keyword arguments directly, where the key is the column name and the value is the expression or Series for the new column.
# Use keyword Arguments for New Columns
df2 = df.with_columns(
Fees = pl.col("Fees") * 0.9, # Modify Fees applying 10% discount
Trainer = pl.Series(["John", "Steve", "Jeff", "Ravi"]) # Add Trainer column
)
print("DataFrame after modifications:\n", df2)
# Output:
# DataFrame after modifications:
# shape: (4, 5)
┌──────────┬─────────┬──────────┬──────────┬─────────┐
│ Courses ┆ Fees ┆ Duration ┆ Discount ┆ Trainer │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ str ┆ i64 ┆ str │
╞══════════╪═════════╪══════════╪══════════╪═════════╡
│ Spark ┆ 18000.0 ┆ 30days ┆ 1000 ┆ John │
│ Hadoop ┆ 22500.0 ┆ 50days ┆ 1500 ┆ Steve │
│ Hyperion ┆ 27000.0 ┆ 40days ┆ 1200 ┆ Jeff │
│ Pandas ┆ 36000.0 ┆ 60days ┆ 2500 ┆ Ravi │
└──────────┴─────────┴──────────┴──────────┴─────────┘
Add a Constant Column
To add a constant column in Polars, use the with_columns()
method along with pl.lit()
to assign the constant value. This allows you to easily insert a column with the same value across all rows in the DataFrame.
# Add constant column
df2 = df.with_columns(
Status = pl.lit("Active")
)
print("DataFrame after adding constant column:\n", df2)
# Add a constant column 'Status'
df2= df.with_columns(
pl.lit("Active").alias("Status")
)
print("DataFrame with constant column:\n", df2)
# Output:
# DataFrame with constant column:
# shape: (4, 5)
┌──────────┬───────┬──────────┬──────────┬────────┐
│ Courses ┆ Fees ┆ Duration ┆ Discount ┆ Status │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 ┆ str │
╞══════════╪═══════╪══════════╪══════════╪════════╡
│ Spark ┆ 20000 ┆ 30days ┆ 1000 ┆ Active │
│ Hadoop ┆ 25000 ┆ 50days ┆ 1500 ┆ Active │
│ Hyperion ┆ 30000 ┆ 40days ┆ 1200 ┆ Active │
│ Pandas ┆ 40000 ┆ 60days ┆ 2500 ┆ Active │
└──────────┴───────┴──────────┴──────────┴────────┘
Here,
pl.lit("Active")
creates a literal (constant) value.alias("Status")
names the new column.
Add Multiple New Columns
You can add multiple new columns to a Polars DataFrame using the with_columns()
method by either passing a list of expressions or using keyword arguments. Both approaches allow you to define several columns at once in a clear and efficient way.
# Add multiple new columns
df2 = df.with_columns([
pl.lit("Online").alias("Platform"),
pl.Series(["USA", "India", "Canada", "UK"]).alias("Country"),
(pl.col("Fees") - pl.col("Discount")).alias("Net_Fees")
])
print("DataFrame with multiple new columns:\n", df2)
# Output:
# DataFrame with multiple new columns:
# shape: (4, 7)
┌──────────┬───────┬──────────┬──────────┬──────────┬─────────┬──────────┐
│ Courses ┆ Fees ┆ Duration ┆ Discount ┆ Platform ┆ Country ┆ Net_Fees │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 ┆ str ┆ str ┆ i64 │
╞══════════╪═══════╪══════════╪══════════╪══════════╪═════════╪══════════╡
│ Spark ┆ 20000 ┆ 30days ┆ 1000 ┆ Online ┆ USA ┆ 19000 │
│ Hadoop ┆ 25000 ┆ 50days ┆ 1500 ┆ Online ┆ India ┆ 23500 │
│ Hyperion ┆ 30000 ┆ 40days ┆ 1200 ┆ Online ┆ Canada ┆ 28800 │
│ Pandas ┆ 40000 ┆ 60days ┆ 2500 ┆ Online ┆ UK ┆ 37500 │
└──────────┴───────┴──────────┴──────────┴──────────┴─────────┴──────────┘
Conclusion
In conclusion, the with_columns()
method in Polars is a powerful and flexible way to add, modify, or replace multiple columns in a DataFrame efficiently. Whether you use expressions in a list or keyword arguments, it enables clear and concise transformations while keeping your code readable and performant.
Happy Learning!!
Related Articles
- Add Row of Column Totals in Polars
- Polars String Manipulation of Cell Contents
- Polars Replace String in Multiple Columns
- Polars DataFrame Columns Selection
- Polars Adding Days to a Date
- How to use isin in Polars DataFrame?
- Retrieve Date from DateTime Column in Polars
- How to Effectively Create Duplicate Rows in Polars?
- Efficient way to Update a Single Element of a Polars DataFrame?
- How to Append a Python List to Another List (Series) of a Polars DataFrame?