You can add new columns to a Polars DataFrame using the with_columns()
method, which enables you to add one or more columns efficiently. This method allows you to create new columns by applying transformations to existing ones or by defining them directly. Whether you’re inserting constant values, deriving columns from existing data, or performing complex operations, Polars’ intuitive syntax and high performance make the process simple and effective. In this article, I will explain the add new columns to polars DataFrame.
Key Points –
- You can add new columns to a Polars DataFrame using the
with_columns()
method. - The primary method for adding one or more new columns to an existing DataFrame.
- The new column can be derived from existing columns by applying various operations.
- Operations can include arithmetic computations, string manipulations, or custom functions.
- Adding columns does not modify the original DataFrame but returns a new DataFrame with the added columns.
- Columns added using
with_columns()
are appended to the DataFrame in the order specified. - New columns can be created using expressions like arithmetic operations, conditional logic, or transformations.
- Conditional logic can be applied to create new columns based on specific criteria, using
pl.when()
.
Usage of Add New Columns
Adding new columns to a Polars DataFrame is a versatile operation that enables data enhancement and transformation. Polars offers multiple efficient methods for this, providing flexibility to manipulate data, perform calculations, and enrich the DataFrame in diverse ways.
Let’s create a polars DataFrame.
import polars as pl
# Creating a new Polars DataFrame
technologies = {
'Courses': ["Spark", "Hadoop", "Python", "Pandas"],
'Fees': [22000, 25000, 20000, 26000],
'Duration': ['30days', '50days', '40days', '60days'],
'Discount': [1000, 1500, 1200, 2000]
}
df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)
Yields below output.
To create and add a new column to a Polars DataFrame, you can use the with_columns() method. Let’s walk through an example where we add a new column based on some calculation.
# Adding a new column "Total_Fee" by addition Discount from Fees
df2 = df.with_columns(
(pl.col("Fees") + pl.col("Discount")).alias("Total_Fee"))
print(df2)
Here,
- The
Total_Fee
column is created by adding theDiscount
column from theFees
column. - The
alias()
method gives the new column a name (Total_Fee
). - The
with_columns()
method adds the new column to the DataFrame.
Alternatively, adding a new column based on existing columns in a Polars DataFrame allows you to create derived data or perform calculations involving multiple columns. You can use the pl.col()
function to reference existing columns and apply arithmetic or logical operations to create the new column.
# Add a new column
df2 = df.with_columns(
(pl.col("Fees") + 500).alias("Fees_plus_500"))
print(df2)
# Output:
# shape: (4, 5)
┌─────────┬───────┬──────────┬──────────┬───────────────┐
│ Courses ┆ Fees ┆ Duration ┆ Discount ┆ Fees_plus_500 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 ┆ i64 │
╞═════════╪═══════╪══════════╪══════════╪═══════════════╡
│ Spark ┆ 22000 ┆ 30days ┆ 1000 ┆ 22500 │
│ Hadoop ┆ 25000 ┆ 50days ┆ 1500 ┆ 25500 │
│ Python ┆ 20000 ┆ 40days ┆ 1200 ┆ 20500 │
│ Pandas ┆ 26000 ┆ 60days ┆ 2000 ┆ 26500 │
└─────────┴───────┴──────────┴──────────┴───────────────┘
Adding a New Column from a List
To add a new column to a Polars DataFrame from a list, you can use the with_columns()
method combined with the pl.lit()
function or directly create a column using pl.Series()
from the list.
# List of values to be added as a new column
new_column_values = ["Advanced", "Intermediate", "Beginner", "Advanced"]
# Adding a new column "Course_Level" from the list
df2 = df.with_columns(
pl.Series("Course_Level", new_column_values))
print(df2)
# Add a new column 'new_column_values' with values from a list
df2 = df.with_columns(pl.Series("new_column_values", ["Advanced", "Intermediate", "Beginner", "Advanced"]))
print(df2)
# Output:
# shape: (4, 5)
┌─────────┬───────┬──────────┬──────────┬───────────────────┐
│ Courses ┆ Fees ┆ Duration ┆ Discount ┆ new_column_values │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 ┆ str │
╞═════════╪═══════╪══════════╪══════════╪═══════════════════╡
│ Spark ┆ 22000 ┆ 30days ┆ 1000 ┆ Advanced │
│ Hadoop ┆ 25000 ┆ 50days ┆ 1500 ┆ Intermediate │
│ Python ┆ 20000 ┆ 40days ┆ 1200 ┆ Beginner │
│ Pandas ┆ 26000 ┆ 60days ┆ 2000 ┆ Advanced │
└─────────┴───────┴──────────┴──────────┴───────────────────┘
Here,
- A new list
new_column_values
containing values like"Advanced"
,"Intermediate"
, etc., is created. - We use
pl.Series("Course_Level", new_column_values)
to create a new columnCourse_Level
from the list. - The
with_columns()
method adds the new column to the DataFrame.
Adding a New Column with a Constant Value
To add a new column with a constant value to a Polars DataFrame, you can use the pl.lit()
function, which represents a literal (constant) value. You can use this to add a column with the same value for every row.
# Adding a new column "Status" with a constant value "Active"
df2 = df.with_columns(
pl.lit("Active").alias("Status")
)
print(df2)
# Output:
# shape: (4, 5)
┌─────────┬───────┬──────────┬──────────┬────────┐
│ Courses ┆ Fees ┆ Duration ┆ Discount ┆ Status │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 ┆ str │
╞═════════╪═══════╪══════════╪══════════╪════════╡
│ Spark ┆ 22000 ┆ 30days ┆ 1000 ┆ Active │
│ Hadoop ┆ 25000 ┆ 50days ┆ 1500 ┆ Active │
│ Python ┆ 20000 ┆ 40days ┆ 1200 ┆ Active │
│ Pandas ┆ 26000 ┆ 60days ┆ 2000 ┆ Active │
└─────────┴───────┴──────────┴──────────┴────────┘
Here,
pl.lit("Active")
creates a column where every row will have the constant value"Active"
.alias("Status")
assigns the new column the name Status.- The
with_columns()
method is used to add the new column to the DataFrame.
Adding a New Column Based on Existing Columns
To create a new column based on existing columns in a Polars DataFrame, you can apply operations to the existing columns and use the with_columns()
method to add the resulting column.
# Adding a new column "Discounted_Fee" based on a condition
df2 = df.with_columns(
pl.when(pl.col("Fees") > 23000)
.then(pl.col("Fees") - pl.col("Discount") - 1000) # extra discount for fees > 23000
.otherwise(pl.col("Fees") - pl.col("Discount"))
.alias("Discounted_Fee")
)
print(df2)
# Output:
# shape: (4, 5)
shape: (4, 6)
┌─────────┬───────┬──────────┬─────────┬──────────┬────────────── ┐
│ Courses │ Fees │ Duration │ Discount │ Total_Fee │ Discounted_Fee│
│ --- │ --- │ --- │ --- │ --- │ --- │
│ str │ i64 │ str │ i64 │ i64 │ i64 │
╞═════════╪═══════╪══════════╪═════════╪═══════════╪══════════════ ╡
│ Spark │ 22000 │ 30days │ 1000 │ 21000 │ 21000 │
│ Hadoop │ 25000 │ 50days │ 1500 │ 23500 │ 22500 │
│ Python │ 20000 │ 40days │ 1200 │ 18800 │ 18800 │
│ Pandas │ 26000 │ 60days │ 2000 │ 24000 │ 23000 │
└─────────┴───────┴──────────┴─────────┴───────────┴────────────── ┘
Here,
- In Example 1, the new column
Total_Fee
is created by performing a basic arithmetic operation (subtractingDiscount
fromFees
). - In Example 2, the new column
Discounted_Fee
is calculated based on a conditional check usingpl.when()
andpl.otherwise()
. If theFees
are greater than 23000, an additional discount is applied.
Adding Multiple Columns at Once
To add multiple columns at once to a Polars DataFrame, you can use the with_columns()
method and provide all the new columns as a list of expressions. This allows you to perform multiple operations and add several columns in one step.
# Adding multiple new columns
df2 = df.with_columns([
# New column "Total_Fee"
(pl.col("Fees") - pl.col("Discount")).alias("Total_Fee"),
# New column "Course_Level"
pl.when(pl.col("Discount") >= 1500)
.then(pl.lit("Expensive"))
.otherwise(pl.lit("Affordable"))
.alias("Discount_Status")])
print(df2)
# Output:
# shape: (4, 6)
┌─────────┬───────┬──────────┬──────────┬───────────┬─────────────────┐
│ Courses ┆ Fees ┆ Duration ┆ Discount ┆ Total_Fee ┆ Discount_Status │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 ┆ i64 ┆ str │
╞═════════╪═══════╪══════════╪══════════╪═══════════╪═════════════════╡
│ Spark ┆ 22000 ┆ 30days ┆ 1000 ┆ 21000 ┆ Affordable │
│ Hadoop ┆ 25000 ┆ 50days ┆ 1500 ┆ 23500 ┆ Expensive │
│ Python ┆ 20000 ┆ 40days ┆ 1200 ┆ 18800 ┆ Affordable │
│ Pandas ┆ 26000 ┆ 60days ┆ 2000 ┆ 24000 ┆ Expensive │
└─────────┴───────┴──────────┴──────────┴───────────┴─────────────────┘
Here,
Total_Fee
: The new column is created by subtracting theDiscount
column from theFees
column.Course_Level
: The new column is created using a conditional expression (pl.when()
). If theFees
are greater than 23000, it categorizes the course as"Expensive"
, otherwise as"Affordable"
.- The
with_columns()
method accepts a list of column expressions, allowing us to add both columns at once.
Conclusion
In conclusion, adding new columns to a Polars DataFrame is a straightforward and efficient process. Whether you’re adding constant values, creating derived data from existing columns, or performing complex transformations, Polars offers a user-friendly and powerful syntax to simplify these tasks.
Happy Learning!!
Related Articles
- Polars DataFrame.sort() Method
- Polars DataFrame.melt() Method
- Polars DataFrame.unique() Function
- Polars DataFrame.explode() Method
- Convert Polars Cast String to Float
- Polars DataFrame.rename() Method
- Polars DataFrame.filter() Usage & Examples
- Polars DataFrame.join() Explained With Examples
- Polars DataFrame.pivot() Explained with Examples
- Polars Filter DataFrame with Multilple Conditions
- Polars DataFrame.groupby() Explained With Examples