• Post author:
  • Post category:Polars
  • Post last modified:May 4, 2025
  • Reading time:13 mins read
You are currently viewing Strip Entire Polars DataFrame

To strip a Polars DataFrame means to remove leading and trailing whitespace characters, such as spaces, tabs, or newline characters, from all columns containing string data (i.e., columns of type Utf8). This process cleans up the text values, ensuring there are no unwanted spaces at the beginning or end of strings, which can otherwise affect data quality and analysis. In this article, I will explain how to strip an entire Polars DataFrame.

Advertisements

Key Points –

  • Stripping whitespace involves removing leading and trailing spaces from string columns in the DataFrame.
  • Polars provides the str.strip_chars() method to strip whitespace from string columns.
  • The with_columns() method can be used to apply the stripping operation and create a new DataFrame without altering the original one.
  • You can apply str.strip_chars() to an entire DataFrame by checking each column’s data type before applying the operation.
  • Stripping whitespace helps in data cleaning, especially for user-entered or imported data.
  • Polars allows stripping across the entire DataFrame using with_columns() or select().
  • The str.strip_chars() method does not modify or remove null entries.
  • You can use select() to apply the stripping operation across all columns or a subset of columns in a DataFrame.

Usage of Strip Entire Polars DataFrame

Stripping an entire Polars DataFrame involves removing leading and trailing whitespace from all string columns, while leaving non-string columns (e.g., integers, floats) unaffected. This is especially helpful when dealing with raw data, as extra spaces can cause issues during analysis, filtering, or transformations. To accomplish this, you can iterate over the columns, check for string data types, and apply str.strip_chars() only to those string columns.

To run some examples of stripping an entire Polars DataFrame, let’s create a Polars DataFrame.


import polars as pl

technologies= {
    'Courses':[" Spark ", " PySpark ", " Polars ", " Pandas "],
    'Fees' :[22000, 25000, 30000, 35000],
    'Duration':[' 30days ',' 40days ',' 50days ',' 60days ']
          }

df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)

Yields below output.

polars dataframe strip entire

To strip whitespace from all string columns in the DataFrame, you can use the str.strip_chars() method, which removes leading and trailing spaces from string columns.


# Stripping whitespace from all string columns
df2 = df.with_columns([
    pl.col(pl.Utf8).str.strip_chars()  # Strips whitespace from all string columns
])
print("Cleaned DataFrame (Stripped Whitespace):\n", df2)

Yields below output.

polars dataframe strip entire

Alternatively, to strip whitespace from all string columns in a Polars DataFrame, you can apply str.strip_chars() to every string column.


# Stripping whitespace from all string columns
df2 = df.with_columns([
    pl.col(col).str.strip_chars() if df[col].dtype == pl.Utf8 else pl.col(col)
    for col in df.columns
])
print("Cleaned DataFrame (Stripped Whitespace):\n", df2)

Here,

  • The str.strip_chars() method removes any leading and trailing spaces from string columns.
  • pl.col(col).str.strip_chars() is applied only to Utf8 columns (i.e., string columns), leaving other types like integers untouched.

Stripping Whitespace from Specific String Columns

If you want to strip whitespace from specific string columns in a Polars DataFrame, you can apply the str.strip_chars() method to those selected columns, leaving the others unchanged. To target specific columns, use pl.col() for the columns you wish to modify.


# Stripping whitespace only from specific columns (Courses and Duration)
df2 = df.with_columns([
    pl.col("Courses").str.strip_chars(),  # Strip whitespace from 'Courses' column
    pl.col("Duration").str.strip_chars(), # Strip whitespace from 'Duration' column
    pl.col("Fees")                        # Leave 'Fees' unchanged
])
print("Cleaned DataFrame (Stripped Whitespace from Specific Columns):\n", df2)

# Output:
# Cleaned DataFrame (Stripped Whitespace from Specific Columns):
# shape: (4, 3)
┌─────────┬───────┬──────────┐
│ Courses ┆ Fees  ┆ Duration │
│ ---     ┆ ---   ┆ ---      │
│ str     ┆ i64   ┆ str      │
╞═════════╪═══════╪══════════╡
│ Spark   ┆ 22000 ┆ 30days   │
│ PySpark ┆ 25000 ┆ 40days   │
│ Polars  ┆ 30000 ┆ 50days   │
│ Pandas  ┆ 35000 ┆ 60days   │
└─────────┴───────┴──────────┘

Here,

  • Only the Courses and Duration columns are being stripped of whitespace.
  • The Fees column remains unchanged because str.strip_chars() is not applied to it.
  • The function pl.col(col).str.strip_chars() targets specific columns by name, so you can choose which ones to strip based on your needs.

Stripping Whitespace from String Columns Using select()

To strip whitespace from all string columns using select() in Polars, apply str.strip_chars() directly to the string columns within the select() method. Keep in mind that using select() creates a new DataFrame with only the transformed columns, rather than modifying the original one in place.


# Use select() to strip whitespace from string columns
df2 = df.select([
    pl.col(col).str.strip_chars() if df[col].dtype == pl.Utf8 else pl.col(col)
    for col in df.columns
])
print("Stripped DataFrame using select():\n", df2)

# Output:
# Stripped DataFrame using select():
# shape: (4, 3)
┌─────────┬───────┬──────────┐
│ Courses ┆ Fees  ┆ Duration │
│ ---     ┆ ---   ┆ ---      │
│ str     ┆ i64   ┆ str      │
╞═════════╪═══════╪══════════╡
│ Spark   ┆ 22000 ┆ 30days   │
│ PySpark ┆ 25000 ┆ 40days   │
│ Polars  ┆ 30000 ┆ 50days   │
│ Pandas  ┆ 35000 ┆ 60days   │
└─────────┴───────┴──────────┘

Here,

  • select() builds a new DataFrame with only the specified columns.
  • You apply str.strip_chars() to "Courses" and "Duration". "Fees" is included without changes.

Stripping Whitespace from Columns with Mixed Data Types

If you have a Polars DataFrame with a mix of data types (such as strings, integers, and floats) and want to strip whitespace only from the string columns, you should apply the str.strip_chars() method selectively, only to columns where the data type is string.


import polars as pl

# Sample DataFrame with mixed types
data = {
    'Course': [' Python ', ' Spark ', ' Polars ', ' Pandas '],
    'Fees': [25000, 24000, 32000, 30000],
    'Available': [' Yes ', ' No ', ' Yes ', ' No '],
    'Rating': [4.8, 4.5, 4.9, 4.7]
}

df = pl.DataFrame(data)

# Strip whitespace only from string (Utf8) columns
df2 = df.with_columns([
    pl.col(col).str.strip_chars() if df[col].dtype == pl.Utf8 else pl.col(col)
    for col in df.columns
])
print("Cleaned DataFrame (Whitespace stripped from strings only):\n", df2)

# Output:
# Cleaned DataFrame (Whitespace stripped from strings only):
# shape: (4, 4)
┌────────┬───────┬───────────┬────────┐
│ Course ┆ Fees  ┆ Available ┆ Rating │
│ ---    ┆ ---   ┆ ---       ┆ ---    │
│ str    ┆ i64   ┆ str       ┆ f64    │
╞════════╪═══════╪═══════════╪════════╡
│ Python ┆ 25000 ┆ Yes       ┆ 4.8    │
│ Spark  ┆ 24000 ┆ No        ┆ 4.5    │
│ Polars ┆ 32000 ┆ Yes       ┆ 4.9    │
│ Pandas ┆ 30000 ┆ No        ┆ 4.7    │
└────────┴───────┴───────────┴────────┘

Here,

  • We loop over all columns in the DataFrame.
  • If a column’s data type is pl.Utf8 (string), we apply .str.strip_chars().
  • Otherwise, we leave the column as is.

Stripping Whitespace and Creating a New Column

If you want to strip whitespace from one or more string columns and create a new column with the cleaned data in Polars, you can easily do this using with_columns() and applying string operations like str.strip_chars() to your desired column(s).


# Strip whitespace from 'Course' and 'Available' columns, then create new columns
df2 = df.with_columns([
    pl.col('Course').str.strip_chars().alias('Cleaned_Course'),
    pl.col('Available').str.strip_chars().alias('Cleaned_Available')
])
print("DataFrame with New Columns (Stripped Whitespace):\n", df2)

# Output:
# DataFrame with New Columns (Stripped Whitespace):
# shape: (4, 6)
┌──────────┬───────┬───────────┬────────┬────────────────┬───────────────────┐
│ Course   ┆ Fees  ┆ Available ┆ Rating ┆ Cleaned_Course ┆ Cleaned_Available │
│ ---      ┆ ---   ┆ ---       ┆ ---    ┆ ---            ┆ ---               │
│ str      ┆ i64   ┆ str       ┆ f64    ┆ str            ┆ str               │
╞══════════╪═══════╪═══════════╪════════╪════════════════╪═══════════════════╡
│  Python  ┆ 25000 ┆  Yes      ┆ 4.8    ┆ Python         ┆ Yes               │
│  Spark   ┆ 24000 ┆  No       ┆ 4.5    ┆ Spark          ┆ No                │
│  Polars  ┆ 32000 ┆  Yes      ┆ 4.9    ┆ Polars         ┆ Yes               │
│  Pandas  ┆ 30000 ┆  No       ┆ 4.7    ┆ Pandas         ┆ No                │
└──────────┴───────┴───────────┴────────┴────────────────┴───────────────────┘

Here,

  • Using str.strip_chars() on the columns you want to clean (here, ‘Course’ and ‘Available’).
  • Creating new columns using .alias() to hold the stripped data, ensuring the original columns are not overwritten.
  • with_columns() is used to apply the transformations and return a new DataFrame with the additional columns.

Conclusion

In conclusion, stripping whitespace from string columns in a Polars DataFrame is a straightforward process that can be accomplished using the str.strip_chars() method. Whether you’re applying it to all string columns, targeting specific ones, or creating new columns with cleaned data, Polars provides flexible ways to handle whitespace removal efficiently. By leveraging with_columns(), select(), and data type checks, you can ensure your DataFrame is properly cleaned and ready for further analysis.

Happy Learning!!

Reference