Conditional assignment in a Polars DataFrame refers to the process of assigning values to a column in a DataFrame based on specific conditions or criteria applied to one or more columns. Instead of applying a single value across the entire column, you define rules that decide what value to assign based on certain conditions.
In Polars, conditional assignment is achieved using the when(), then(), and otherwise() methods. These methods allow you to evaluate conditions for each row in the DataFrame and assign different values accordingly. In this article, I will explain how to perform conditional assignment in polars DataFrame.
Key Points –
- Conditional Assignment in Polars allows you to assign values to columns based on specified conditions, enabling dynamic and rule-based transformations.
- Conditional assignment in Polars uses the
when(),then(), andotherwise()methods to apply logic based on conditions. - You can chain multiple
when()andthen()clauses to create nested conditional logic. - Logical operators like
&(AND),|(OR), and~(NOT) can be used to combine multiple conditions. when()defines the condition to be checked for each row or element in the DataFrame.then()specifies the value to assign when the condition defined inwhen()is true.otherwise()provides an alternative value to assign when the condition inwhen()is false.- Conditional assignment can be applied to create new columns or modify existing ones.
Usage of Conditional Assignment in Polars DataFrame
Conditional assignment in a Polars DataFrame allows you to modify or create new columns based on conditions applied to existing columns. This is particularly useful for data transformation, cleaning, and deriving new insights from the data. Polars provides a when(), then(), and otherwise() API to make conditional assignments easy to implement.
To run some examples of conditional assignment in Polars DataFrame, let’s create a Polars DataFrame.
import polars as pl
technologies = {
'Courses': ["Spark", "PySpark", "Polars", "Pandas"],
'Fees': [22000, 25000, 30000, 35000],
'Discount': [1000, 1500, 2500, 2000],
'Duration': ['30days', '40days', '50days', '60days']
}
df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)
Yields below output.

You can use the when(), then(), and otherwise() methods for conditional assignment. These methods allow you to create new columns or update existing ones based on specific conditions using the when and then functions.
Here’s an example where we use these methods to conditionally assign values in a new column based on the Fees column in your DataFrame.
# Conditional assignment for Discounted Fees
df = df.with_columns(
pl.when(pl.col("Fees") > 30000)
.then(pl.col("Fees") * 0.9) # Apply 10% discount
.when(pl.col("Fees") >= 25000)
.then(pl.col("Fees") * 0.95) # Apply 5% discount
.otherwise(pl.col("Fees")) # No discount
.alias("Discounted_Fees")
)
print("DataFrame after Conditional Assignment:\n", df)
Here,
- For “Spark” (Fees = 22000): Since the fee is less than 25000, no discount is applied, so the
Discounted_Feesis 22000. - For “PySpark” (Fees = 25000): A 5% discount is applied (25000 * 0.95 = 23750).
- For “Polars” (Fees = 30000): A 5% discount is applied (30000 * 0.95 = 28500).
- For “Pandas” (Fees = 35000): A 10% discount is applied (35000 * 0.9 = 31500).

Using Multiple Conditions with when() and then()
You can use multiple conditions in Polars with the when(), then(), and otherwise() methods by chaining them. Each when() condition will evaluate separately, and you can combine multiple conditions using logical operators like & (AND) and | (OR).
# Conditional assignment with multiple conditions
df2 = df.with_columns(
pl.when((pl.col("Fees") > 30000) & (pl.col("Duration") > '40days'))
.then(pl.col("Fees") * 0.85) # 15% discount
.when((pl.col("Fees").is_between(25000, 30000)) & (pl.col("Duration") == '40days'))
.then(pl.col("Fees") * 0.90) # 10% discount
.when((pl.col("Fees") < 25000) & (pl.col("Duration") == '30days'))
.then(pl.col("Fees") * 0.95) # 5% discount
.otherwise(pl.col("Fees")) # No discount
.alias("Discounted_Fees")
)
print("DataFrame after Multiple Conditions:\n", df2)
# Output:
# DataFrame after Multiple Conditions:
# shape: (4, 5)
┌─────────┬───────┬──────────┬──────────┬─────────────────┐
│ Courses ┆ Fees ┆ Discount ┆ Duration ┆ Discounted_Fees │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ str ┆ f64 │
╞═════════╪═══════╪══════════╪══════════╪═════════════════╡
│ Spark ┆ 22000 ┆ 1000 ┆ 30days ┆ 20900.0 │
│ PySpark ┆ 25000 ┆ 1500 ┆ 40days ┆ 22500.0 │
│ Polars ┆ 30000 ┆ 2500 ┆ 50days ┆ 30000.0 │
│ Pandas ┆ 35000 ┆ 2000 ┆ 60days ┆ 29750.0 │
└─────────┴───────┴──────────┴──────────┴─────────────────┘
Here,
- For “Spark” (Fees = 22000, Duration = “30days”):
Fees < 25000andDuration == "30days", so a 5% discount is applied (22000 * 0.95 = 20900). - For “PySpark” (Fees = 25000, Duration = “40days”):
Fees between 25000 and 30000andDuration == "40days", so a 10% discount is applied (25000 * 0.90 = 22500) - For “Polars” (Fees = 30000, Duration = “50days”): The condition for a discount does not match, so no discount is applied (
30000remains unchanged). - For “Pandas” (Fees = 35000, Duration = “60days”):
Fees > 30000andDuration > "40days", so a 15% discount is applied (35000 * 0.85 = 29750)
Conditional Assignment with Boolean Logic
Conditional assignment with Boolean logic in Polars operates similarly to using boolean expressions in when() and then(). You can leverage logical operators such as & (AND), | (OR), and ~ (NOT) to combine multiple conditions into a single expression.
# Conditional assignment using boolean logic
df = df.with_columns(
pl.when((pl.col("Fees") > 30000) & (pl.col("Discount") > 2000))
.then(pl.lit("Premium")) # Premium if both conditions are true
.when((pl.col("Fees").is_between(25000, 30000)) & (pl.col("Discount") <= 2000))
.then(pl.lit("Standard")) # Standard if conditions match
.otherwise(pl.lit("Basic")) # Basic for all other cases
.alias("Category")
)
print("DataFrame after Conditional Assignment with Boolean Logic:\n", df)
# Output:
# DataFrame after Conditional Assignment with Boolean Logic:
# shape: (4, 5)
┌─────────┬───────┬──────────┬──────────┬──────────┐
│ Courses ┆ Fees ┆ Discount ┆ Duration ┆ Category │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ str ┆ str │
╞═════════╪═══════╪══════════╪══════════╪══════════╡
│ Spark ┆ 22000 ┆ 1000 ┆ 30days ┆ Basic │
│ PySpark ┆ 25000 ┆ 1500 ┆ 40days ┆ Standard │
│ Polars ┆ 30000 ┆ 2500 ┆ 50days ┆ Standard │
│ Pandas ┆ 35000 ┆ 2000 ┆ 60days ┆ Premium │
└─────────┴───────┴──────────┴──────────┴──────────┘
Here,
- For “Spark” (Fees = 22000, Discount = 1000):
Fees < 25000, so it is categorized as"Basic". - For “PySpark” (Fees = 25000, Discount = 1500):
Feesis between 25000 and 30000 andDiscountis less than or equal to 2000, so it is categorized as"Standard". - For “Polars” (Fees = 30000, Discount = 2500):
Feesis between 25000 and 30000 andDiscountis greater than 2000, so it is categorized as"Standard". For "Pandas"(Fees = 35000, Discount = 2000):Fees > 30000andDiscount > 2000, so it is categorized as"Premium".
Nested Conditional Assignment
You can perform nested conditional assignments by chaining when(), then(), and otherwise() clauses. This allows you to define complex conditions that depend on multiple criteria, where the result of one condition can lead to additional conditions.
# Nested conditional assignment for "Category"
df2 = df.with_columns(
pl.when(pl.col("Fees") > 30000)
.then(
pl.when(pl.col("Discount") > 2000)
.then(pl.lit("Premium")) # If Discount > 2000, assign "Premium"
.otherwise(pl.lit("High-End")) # Otherwise, assign "High-End"
)
.when(pl.col("Fees").is_between(25000, 30000))
.then(
pl.when(pl.col("Discount") > 1500)
.then(pl.lit("Standard")) # If Discount > 1500, assign "Standard"
.otherwise(pl.lit("Mid-Range")) # Otherwise, assign "Mid-Range"
)
.otherwise(pl.lit("Budget")) # If Fees < 25000, assign "Budget"
.alias("Category")
)
print("DataFrame after Nested Conditional Assignment:\n", df2)
# Output:
# DataFrame after Nested Conditional Assignment:
# shape: (4, 5)
┌─────────┬───────┬──────────┬──────────┬───────────┐
│ Courses ┆ Fees ┆ Discount ┆ Duration ┆ Category │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ str ┆ str │
╞═════════╪═══════╪══════════╪══════════╪═══════════╡
│ Spark ┆ 22000 ┆ 1000 ┆ 30days ┆ Budget │
│ PySpark ┆ 25000 ┆ 1500 ┆ 40days ┆ Mid-Range │
│ Polars ┆ 30000 ┆ 2500 ┆ 50days ┆ Standard │
│ Pandas ┆ 35000 ┆ 2000 ┆ 60days ┆ High-End │
└─────────┴───────┴──────────┴──────────┴───────────┘
Here,
- For “Spark” (Fees = 22000, Discount = 1000):
Fees < 25000, so the category is"Budget". - For “PySpark” (Fees = 25000, Discount = 1500):
Feesis between 25000 and 30000, andDiscountis less than or equal to 1500, so the category is"Mid-Range". - For “Polars” (Fees = 30000, Discount = 2500):
Feesis between 25000 and 30000, andDiscountis greater than 1500, so the category is"Standard". - For “Pandas
"(Fees = 35000, Discount = 2000):Feesis greater than 30000, andDiscountis greater than 2000, so the category is"High-End".
Replacing Missing Values Conditionally
You can conditionally replace missing values using the when(), then(), and otherwise() methods. This lets you define custom replacement values for nulls based on specific conditions within your DataFrame.
import polars as pl
# Sample DataFrame with missing values
technologies = {
'Courses': ["Spark", "PySpark", "Polars", "Pandas"],
'Fees': [22000, 25000, 30000, 35000],
'Discount': [None, 1500, None, None],
'Duration': ['30days', '40days', '50days', '60days']
}
df = pl.DataFrame(technologies)
# Replace missing values conditionally
df = df.with_columns(
pl.when(pl.col("Discount").is_null())
.then(
pl.when(pl.col("Fees") > 30000)
.then(pl.lit(2500)) # If Fees > 30000, replace null with 2500
.when(pl.col("Fees").is_between(25000, 30000))
.then(pl.lit(1500)) # If Fees between 25000 and 30000, replace null with 1500
.otherwise(pl.lit(1000)) # Otherwise, replace null with 1000
)
.otherwise(pl.col("Discount")) # Keep original value if not null
.alias("Discount") # Replace the original Discount column
)
print("DataFrame after Replacing Missing Values Conditionally:\n", df)
# Output:
# DataFrame after Replacing Missing Values Conditionally:
# shape: (4, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fees ┆ Discount ┆ Duration │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ str │
╞═════════╪═══════╪══════════╪══════════╡
│ Spark ┆ 22000 ┆ 1000 ┆ 30days │
│ PySpark ┆ 25000 ┆ 1500 ┆ 40days │
│ Polars ┆ 30000 ┆ 1500 ┆ 50days │
│ Pandas ┆ 35000 ┆ 2500 ┆ 60days │
└─────────┴───────┴──────────┴──────────┘
Here,
- For “Spark” (Fees = 22000, Discount =
None):Feesis less than 25000, so thenullvalue in"Discount"is replaced with1000. - For “PySpark” (Fees = 25000, Discount = 1500): There is no missing value in “Discount”, so it stays as
1500. - For “Polars” (Fees = 30000, Discount =
None):Feesis between 25000 and 30000, so thenullvalue in"Discount"is replaced with1500. - For “Pandas” (Fees = 35000, Discount =
None):Feesis greater than 30000, so thenullvalue in"Discount"is replaced with2500.
Conclusion
In conclusion, Polars provides a powerful and flexible framework for performing conditional assignments, replacing missing values, and implementing complex logic on DataFrames. Key features like when, then, otherwise, and Boolean operators (&, |, ~) enable you to construct sophisticated conditions for manipulating data efficiently.
Happy Learning!!
Related Articles
- How to Update the Polars DataFrame
- Make a Constant Column in Polars
- Extract Value of Polars Literal
- Check if any Value in a Polars DataFrame is True
- Polars Counting Elements in List Column
- Convert Polars Casting a Column to Decimal
- Polars Looping Through the Rows in a Dataset
- How to Change Position of a Column in Polars
- Get First N Characters from a String Column in Polars
- Removing Null Values on Selected Columns only in Polars DataFrame