• Post author:
  • Post category:Polars
  • Post last modified:February 18, 2025
  • Reading time:12 mins read
You are currently viewing Add New Columns to Polars DataFrame

You can add new columns to a Polars DataFrame using the with_columns() method, which enables you to add one or more columns efficiently. This method allows you to create new columns by applying transformations to existing ones or by defining them directly. Whether you’re inserting constant values, deriving columns from existing data, or performing complex operations, Polars’ intuitive syntax and high performance make the process simple and effective. In this article, I will explain the add new columns to polars DataFrame.

Advertisements

Key Points –

  • You can add new columns to a Polars DataFrame using the with_columns() method.
  • The primary method for adding one or more new columns to an existing DataFrame.
  • The new column can be derived from existing columns by applying various operations.
  • Operations can include arithmetic computations, string manipulations, or custom functions.
  • Adding columns does not modify the original DataFrame but returns a new DataFrame with the added columns.
  • Columns added using with_columns() are appended to the DataFrame in the order specified.
  • New columns can be created using expressions like arithmetic operations, conditional logic, or transformations.
  • Conditional logic can be applied to create new columns based on specific criteria, using pl.when().

Usage of Add New Columns

Adding new columns to a Polars DataFrame is a versatile operation that enables data enhancement and transformation. Polars offers multiple efficient methods for this, providing flexibility to manipulate data, perform calculations, and enrich the DataFrame in diverse ways.

Let’s create a polars DataFrame.


import polars as pl

# Creating a new Polars DataFrame
technologies = {
    'Courses': ["Spark", "Hadoop", "Python", "Pandas"],
    'Fees': [22000, 25000, 20000, 26000],
    'Duration': ['30days', '50days', '40days', '60days'],
    'Discount': [1000, 1500, 1200, 2000]
}

df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)

Yields below output.

Polars dataframe add new columns

To create and add a new column to a Polars DataFrame, you can use the with_columns() method. Let’s walk through an example where we add a new column based on some calculation.


# Adding a new column "Total_Fee" by addition Discount from Fees
df2 = df.with_columns(
    (pl.col("Fees") + pl.col("Discount")).alias("Total_Fee"))
print(df2)

Here,

  • The Total_Fee column is created by adding the Discount column from the Fees column.
  • The alias() method gives the new column a name (Total_Fee).
  • The with_columns() method adds the new column to the DataFrame.
Polars dataframe add new columns

Alternatively, adding a new column based on existing columns in a Polars DataFrame allows you to create derived data or perform calculations involving multiple columns. You can use the pl.col() function to reference existing columns and apply arithmetic or logical operations to create the new column.


# Add a new column
df2 = df.with_columns(
    (pl.col("Fees") + 500).alias("Fees_plus_500"))
print(df2)

# Output:
# shape: (4, 5)
┌─────────┬───────┬──────────┬──────────┬───────────────┐
│ Courses ┆ Fees  ┆ Duration ┆ Discount ┆ Fees_plus_500 │
│ ---     ┆ ---   ┆ ---      ┆ ---      ┆ ---           │
│ str     ┆ i64   ┆ str      ┆ i64      ┆ i64           │
╞═════════╪═══════╪══════════╪══════════╪═══════════════╡
│ Spark   ┆ 22000 ┆ 30days   ┆ 1000     ┆ 22500         │
│ Hadoop  ┆ 25000 ┆ 50days   ┆ 1500     ┆ 25500         │
│ Python  ┆ 20000 ┆ 40days   ┆ 1200     ┆ 20500         │
│ Pandas  ┆ 26000 ┆ 60days   ┆ 2000     ┆ 26500         │
└─────────┴───────┴──────────┴──────────┴───────────────┘

Adding a New Column from a List

To add a new column to a Polars DataFrame from a list, you can use the with_columns() method combined with the pl.lit() function or directly create a column using pl.Series() from the list.


# List of values to be added as a new column
new_column_values = ["Advanced", "Intermediate", "Beginner", "Advanced"]
# Adding a new column "Course_Level" from the list
df2 = df.with_columns(
    pl.Series("Course_Level", new_column_values))
print(df2)

# Add a new column 'new_column_values' with values from a list
df2 = df.with_columns(pl.Series("new_column_values", ["Advanced", "Intermediate", "Beginner", "Advanced"]))
print(df2)

# Output:
# shape: (4, 5)
┌─────────┬───────┬──────────┬──────────┬───────────────────┐
│ Courses ┆ Fees  ┆ Duration ┆ Discount ┆ new_column_values │
│ ---     ┆ ---   ┆ ---      ┆ ---      ┆ ---               │
│ str     ┆ i64   ┆ str      ┆ i64      ┆ str               │
╞═════════╪═══════╪══════════╪══════════╪═══════════════════╡
│ Spark   ┆ 22000 ┆ 30days   ┆ 1000     ┆ Advanced          │
│ Hadoop  ┆ 25000 ┆ 50days   ┆ 1500     ┆ Intermediate      │
│ Python  ┆ 20000 ┆ 40days   ┆ 1200     ┆ Beginner          │
│ Pandas  ┆ 26000 ┆ 60days   ┆ 2000     ┆ Advanced          │
└─────────┴───────┴──────────┴──────────┴───────────────────┘

Here,

  • A new list new_column_values containing values like "Advanced", "Intermediate", etc., is created.
  • We use pl.Series("Course_Level", new_column_values) to create a new column Course_Level from the list.
  • The with_columns() method adds the new column to the DataFrame.

Adding a New Column with a Constant Value

To add a new column with a constant value to a Polars DataFrame, you can use the pl.lit() function, which represents a literal (constant) value. You can use this to add a column with the same value for every row.


# Adding a new column "Status" with a constant value "Active"
df2 = df.with_columns(
    pl.lit("Active").alias("Status")
)
print(df2)

# Output:
# shape: (4, 5)
┌─────────┬───────┬──────────┬──────────┬────────┐
│ Courses ┆ Fees  ┆ Duration ┆ Discount ┆ Status │
│ ---     ┆ ---   ┆ ---      ┆ ---      ┆ ---    │
│ str     ┆ i64   ┆ str      ┆ i64      ┆ str    │
╞═════════╪═══════╪══════════╪══════════╪════════╡
│ Spark   ┆ 22000 ┆ 30days   ┆ 1000     ┆ Active │
│ Hadoop  ┆ 25000 ┆ 50days   ┆ 1500     ┆ Active │
│ Python  ┆ 20000 ┆ 40days   ┆ 1200     ┆ Active │
│ Pandas  ┆ 26000 ┆ 60days   ┆ 2000     ┆ Active │
└─────────┴───────┴──────────┴──────────┴────────┘

Here,

  • pl.lit("Active") creates a column where every row will have the constant value "Active".
  • alias("Status") assigns the new column the name Status.
  • The with_columns() method is used to add the new column to the DataFrame.

Adding a New Column Based on Existing Columns

To create a new column based on existing columns in a Polars DataFrame, you can apply operations to the existing columns and use the with_columns() method to add the resulting column.


# Adding a new column "Discounted_Fee" based on a condition
df2 = df.with_columns(
    pl.when(pl.col("Fees") > 23000)
      .then(pl.col("Fees") - pl.col("Discount") - 1000)  # extra discount for fees > 23000
      .otherwise(pl.col("Fees") - pl.col("Discount"))
      .alias("Discounted_Fee")
)
print(df2)

# Output:
# shape: (4, 5)
shape: (4, 6)
┌─────────┬───────┬──────────┬─────────┬──────────┬──────────────   ┐
│ Courses │ Fees  │ Duration │ Discount │ Total_Fee │ Discounted_Fee│
│ ---     │ ---   │ ---      │ ---     │ ---       │ ---            │
│ str     │ i64   │ str      │ i64     │ i64       │ i64            │
╞═════════╪═══════╪══════════╪═════════╪═══════════╪══════════════  ╡
│ Spark   │ 22000 │ 30days   │ 1000    │ 21000     │ 21000          │
│ Hadoop  │ 25000 │ 50days   │ 1500    │ 23500     │ 22500          │
│ Python  │ 20000 │ 40days   │ 1200    │ 18800     │ 18800          │
│ Pandas  │ 26000 │ 60days   │ 2000    │ 24000     │ 23000          │
└─────────┴───────┴──────────┴─────────┴───────────┴──────────────  ┘

Here,

  • In Example 1, the new column Total_Fee is created by performing a basic arithmetic operation (subtracting Discount from Fees).
  • In Example 2, the new column Discounted_Fee is calculated based on a conditional check using pl.when() and pl.otherwise(). If the Fees are greater than 23000, an additional discount is applied.

Adding Multiple Columns at Once

To add multiple columns at once to a Polars DataFrame, you can use the with_columns() method and provide all the new columns as a list of expressions. This allows you to perform multiple operations and add several columns in one step.


# Adding multiple new columns
df2 = df.with_columns([
    # New column "Total_Fee"
    (pl.col("Fees") - pl.col("Discount")).alias("Total_Fee"),
    
       # New column "Course_Level"
       pl.when(pl.col("Discount") >= 1500)
      .then(pl.lit("Expensive"))
      .otherwise(pl.lit("Affordable"))
      .alias("Discount_Status")])
print(df2)

# Output:
# shape: (4, 6)
┌─────────┬───────┬──────────┬──────────┬───────────┬─────────────────┐
│ Courses ┆ Fees  ┆ Duration ┆ Discount ┆ Total_Fee ┆ Discount_Status │
│ ---     ┆ ---   ┆ ---      ┆ ---      ┆ ---       ┆ ---             │
│ str     ┆ i64   ┆ str      ┆ i64      ┆ i64       ┆ str             │
╞═════════╪═══════╪══════════╪══════════╪═══════════╪═════════════════╡
│ Spark   ┆ 22000 ┆ 30days   ┆ 1000     ┆ 21000     ┆ Affordable      │
│ Hadoop  ┆ 25000 ┆ 50days   ┆ 1500     ┆ 23500     ┆ Expensive       │
│ Python  ┆ 20000 ┆ 40days   ┆ 1200     ┆ 18800     ┆ Affordable      │
│ Pandas  ┆ 26000 ┆ 60days   ┆ 2000     ┆ 24000     ┆ Expensive       │
└─────────┴───────┴──────────┴──────────┴───────────┴─────────────────┘

Here,

  • Total_Fee: The new column is created by subtracting the Discount column from the Fees column.
  • Course_Level: The new column is created using a conditional expression (pl.when()). If the Fees are greater than 23000, it categorizes the course as "Expensive", otherwise as "Affordable".
  • The with_columns() method accepts a list of column expressions, allowing us to add both columns at once.

Conclusion

In conclusion, adding new columns to a Polars DataFrame is a straightforward and efficient process. Whether you’re adding constant values, creating derived data from existing columns, or performing complex transformations, Polars offers a user-friendly and powerful syntax to simplify these tasks.

Happy Learning!!

References