• Post author:
  • Post category:Polars
  • Post last modified:May 18, 2025
  • Reading time:11 mins read
You are currently viewing Polars String Manipulation of Cell Contents

In Polars, string manipulation on cell contents is achieved through the str namespace, which is accessible on columns with the Utf8 (string) data type. This kind of manipulation involves performing operations that modify or analyze the text data stored within the cells of a Polars DataFrame or Series.

Advertisements

Essentially, Polars string manipulation of cell contents refers to efficiently inspecting, transforming, or analyzing the string values inside each cell of string-typed columns. In this article, I will explain the string manipulation of cell contents in polars.

Key Points –

  • Polars string methods are vectorized, allowing fast and efficient operations on entire columns.
  • String trimming methods let you strip whitespace or specified characters from the start and end of strings.
  • Polars provides a str namespace to perform vectorized string operations on DataFrame columns.
  • You can convert strings to lowercase using str.to_lowercase().
  • Strings can be converted to uppercase with str.to_uppercase().
  • Leading and trailing whitespace can be removed using str.strip_chars().
  • Substrings within strings can be replaced using str.replace().
  • You can check for the presence of substrings using str.contains(), which returns a Boolean mask.

Usage of Polars String Manipulation of Cell Contents

Polars provides powerful, efficient, and easy-to-use string manipulation methods to clean, transform, and analyze textual data inside DataFrame columns. These methods are accessed through the str namespace on string columns and work in a vectorized manner for high performance.

First, let’s create a Polars DataFrame.


import polars as pl

technologies= ({
    'Courses':["Spark","Python","Spark","Python","Pandas"],
    'Fees' :[22000,25000,22000,25000,24000],
    'Duration':['30days','40days','60days','45days','50days'],
              })
df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)

Yields below output.

polars string cell contents

To convert string columns to lowercase in a Polars DataFrame, apply the str.to_lowercase() method to the string columns. For instance, to convert the values in the 'Courses' and 'Duration' columns to lowercase, you can use str.to_lowercase() on those columns in Polars.


# Convert 'Courses' and 'Duration' columns to lowercase
df2 = df.with_columns([
    pl.col("Courses").str.to_lowercase(),
    pl.col("Duration").str.to_lowercase()
])
print("DataFrame with lowercase strings:\n", df2)

Here,

  • pl.col("Courses") selects the column Courses.
  • str.to_lowercase() changes all the string values in that column to lowercase. Same is done for Duration.
  • with_columns() replaces those columns with the updated lowercase versions.
polars string cell contents

Convert Strings to Uppercase

To convert string columns in a Polars DataFrame to uppercase, use the str.to_uppercase() method on the relevant columns. For instance, you can apply this method to the 'Courses' and 'Duration' columns to transform their values to uppercase.


# Convert 'Courses' and 'Duration' columns to uppercase
df2 = df.with_columns([
    pl.col("Courses").str.to_uppercase(),
    pl.col("Duration").str.to_uppercase()
])
print("DataFrame with uppercase strings:\n", df2)

# Output:
# DataFrame with uppercase strings:
# shape: (5, 3)
┌─────────┬───────┬──────────┐
│ Courses ┆ Fees  ┆ Duration │
│ ---     ┆ ---   ┆ ---      │
│ str     ┆ i64   ┆ str      │
╞═════════╪═══════╪══════════╡
│ SPARK   ┆ 22000 ┆ 30DAYS   │
│ PYTHON  ┆ 25000 ┆ 40DAYS   │
│ SPARK   ┆ 22000 ┆ 60DAYS   │
│ PYTHON  ┆ 25000 ┆ 45DAYS   │
│ PANDAS  ┆ 24000 ┆ 50DAYS   │
└─────────┴───────┴──────────┘

Here,

  • pl.col("Courses") selects the Courses column.
  • str.to_uppercase() converts all string values in the column to uppercase.
  • with_columns() applies these changes to the DataFrame.

Check if a String Contains a Substring

To check if a string contains a substring in Polars, you can use the str.contains() method on a string column. This method returns a Boolean Series indicating whether each string contains the given substring or pattern.


# Check if 'Courses' contains 'spark' (case sensitive)
df2 = df.select(
    pl.col("Courses").str.contains("Spark").alias("Contains_Spark")
)
print(df2)

# Output:
# shape: (5, 1)
┌────────────────┐
│ Contains_Spark │
│ ---            │
│ bool           │
╞════════════════╡
│ true           │
│ false          │
│ true           │
│ false          │
│ false          │
└────────────────┘

Replace Substring

To replace a substring within string columns in Polars, use the str.replace() method on the target column. For example, to replace "Spark" with "Flink" in the 'Course' column of a Polars DataFrame, you can apply str.replace() accordingly.


# Replace 'Spark' with 'Flink' in the 'Courses' column
df = df.with_columns([
    pl.col("Courses").str.replace("Spark", "Flink")
])
print("DataFrame after replacing part of strings:\n", df)

# Output:
# DataFrame after replacing part of strings:
# shape: (5, 3)
┌─────────┬───────┬──────────┐
│ Courses ┆ Fees  ┆ Duration │
│ ---     ┆ ---   ┆ ---      │
│ str     ┆ i64   ┆ str      │
╞═════════╪═══════╪══════════╡
│ Flink   ┆ 22000 ┆ 30days   │
│ Python  ┆ 25000 ┆ 40days   │
│ Flink   ┆ 22000 ┆ 60days   │
│ Python  ┆ 25000 ┆ 45days   │
│ Pandas  ┆ 24000 ┆ 50days   │
└─────────┴───────┴──────────┘

Extract Substring (slice)

To extract a substring (slice) from strings in a Polars DataFrame column, you can use the str.slice() method. This method allows you to specify the start position and optionally the length of the slice.


# Extract first 2 characters from 'Duration'
df2 = df.with_columns(
    pl.col("Duration").str.slice(0, 2).alias("Duration_slice")
)
print(df2)

# Output:
# shape: (5, 4)
┌─────────┬───────┬──────────┬────────────────┐
│ Courses ┆ Fees  ┆ Duration ┆ Duration_slice │
│ ---     ┆ ---   ┆ ---      ┆ ---            │
│ str     ┆ i64   ┆ str      ┆ str            │
╞═════════╪═══════╪══════════╪════════════════╡
│ Spark   ┆ 22000 ┆ 30days   ┆ 30             │
│ Python  ┆ 25000 ┆ 40days   ┆ 40             │
│ Spark   ┆ 22000 ┆ 60days   ┆ 60             │
│ Python  ┆ 25000 ┆ 45days   ┆ 45             │
│ Pandas  ┆ 24000 ┆ 50days   ┆ 50             │
└─────────┴───────┴──────────┴────────────────┘

Strip Whitespace from Both Ends

To remove whitespace from both ends of string columns in Polars, you can use the str.strip_chars() method without any arguments, as it defaults to stripping whitespace. This method can be applied to a Polars DataFrame column to trim spaces from the start and end of the string values.


import polars as pl

technologies = {
    'Courses': ["  Spark  ", " Python ", " Spark", "Python  ", " Pandas "],
    'Fees': [22000, 25000, 22000, 25000, 24000],
    'Duration': ['30days', '40days', '60days', '45days', '50days'],
}

df = pl.DataFrame(technologies)

# Strip whitespace from both ends of 'Courses' and 'Duration' columns
df2 = df.with_columns([
    pl.col("Courses").str.strip_chars().alias("Courses_stripped"),
    pl.col("Duration").str.strip_chars().alias("Duration_stripped")
])
print("DataFrame after stripping whitespace:\n", df2)

# Output:
# DataFrame after stripping whitespace:
# shape: (5, 5)
┌───────────┬───────┬──────────┬──────────────────┬───────────────────┐
│ Courses   ┆ Fees  ┆ Duration ┆ Courses_stripped ┆ Duration_stripped │
│ ---       ┆ ---   ┆ ---      ┆ ---              ┆ ---               │
│ str       ┆ i64   ┆ str      ┆ str              ┆ str               │
╞═══════════╪═══════╪══════════╪══════════════════╪═══════════════════╡
│   Spark   ┆ 22000 ┆ 30days   ┆ Spark            ┆ 30days            │
│  Python   ┆ 25000 ┆ 40days   ┆ Python           ┆ 40days            │
│  Spark    ┆ 22000 ┆ 60days   ┆ Spark            ┆ 60days            │
│ Python    ┆ 25000 ┆ 45days   ┆ Python           ┆ 45days            │
│  Pandas   ┆ 24000 ┆ 50days   ┆ Pandas           ┆ 50days            │
└───────────┴───────┴──────────┴──────────────────┴───────────────────┘

Here,

  • str.strip_chars() removes all leading and trailing whitespace characters (spaces, tabs, newlines) by default.
  • Applied here to both Courses and Duration columns.

Conclusion

In summary, Polars simplifies text processing by providing str methods that allow you to transform, clean, and analyze string columns efficiently. With functions like to_lowercase(), to_uppercase(), strip_chars(), replace(), and contains() function.

Happy Learning!!

Reference