To strip a Polars DataFrame means to remove leading and trailing whitespace characters, such as spaces, tabs, or newline characters, from all columns containing string data (i.e., columns of type Utf8). This process cleans up the text values, ensuring there are no unwanted spaces at the beginning or end of strings, which can otherwise affect data quality and analysis. In this article, I will explain how to strip an entire Polars DataFrame.
Key Points –
- Stripping whitespace involves removing leading and trailing spaces from string columns in the DataFrame.
- Polars provides the
str.strip_chars()method to strip whitespace from string columns. - The
with_columns()method can be used to apply the stripping operation and create a new DataFrame without altering the original one. - You can apply
str.strip_chars()to an entire DataFrame by checking each column’s data type before applying the operation. - Stripping whitespace helps in data cleaning, especially for user-entered or imported data.
- Polars allows stripping across the entire DataFrame using
with_columns()orselect(). - The
str.strip_chars()method does not modify or removenullentries. - You can use
select()to apply the stripping operation across all columns or a subset of columns in a DataFrame.
Usage of Strip Entire Polars DataFrame
Stripping an entire Polars DataFrame involves removing leading and trailing whitespace from all string columns, while leaving non-string columns (e.g., integers, floats) unaffected. This is especially helpful when dealing with raw data, as extra spaces can cause issues during analysis, filtering, or transformations. To accomplish this, you can iterate over the columns, check for string data types, and apply str.strip_chars() only to those string columns.
To run some examples of stripping an entire Polars DataFrame, let’s create a Polars DataFrame.
import polars as pl
technologies= {
'Courses':[" Spark ", " PySpark ", " Polars ", " Pandas "],
'Fees' :[22000, 25000, 30000, 35000],
'Duration':[' 30days ',' 40days ',' 50days ',' 60days ']
}
df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)
Yields below output.
To strip whitespace from all string columns in the DataFrame, you can use the str.strip_chars() method, which removes leading and trailing spaces from string columns.
# Stripping whitespace from all string columns
df2 = df.with_columns([
pl.col(pl.Utf8).str.strip_chars() # Strips whitespace from all string columns
])
print("Cleaned DataFrame (Stripped Whitespace):\n", df2)
Yields below output.

Alternatively, to strip whitespace from all string columns in a Polars DataFrame, you can apply str.strip_chars() to every string column.
# Stripping whitespace from all string columns
df2 = df.with_columns([
pl.col(col).str.strip_chars() if df[col].dtype == pl.Utf8 else pl.col(col)
for col in df.columns
])
print("Cleaned DataFrame (Stripped Whitespace):\n", df2)
Here,
- The
str.strip_chars()method removes any leading and trailing spaces from string columns. pl.col(col).str.strip_chars()is applied only toUtf8columns (i.e., string columns), leaving other types like integers untouched.
Stripping Whitespace from Specific String Columns
If you want to strip whitespace from specific string columns in a Polars DataFrame, you can apply the str.strip_chars() method to those selected columns, leaving the others unchanged. To target specific columns, use pl.col() for the columns you wish to modify.
# Stripping whitespace only from specific columns (Courses and Duration)
df2 = df.with_columns([
pl.col("Courses").str.strip_chars(), # Strip whitespace from 'Courses' column
pl.col("Duration").str.strip_chars(), # Strip whitespace from 'Duration' column
pl.col("Fees") # Leave 'Fees' unchanged
])
print("Cleaned DataFrame (Stripped Whitespace from Specific Columns):\n", df2)
# Output:
# Cleaned DataFrame (Stripped Whitespace from Specific Columns):
# shape: (4, 3)
┌─────────┬───────┬──────────┐
│ Courses ┆ Fees ┆ Duration │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str │
╞═════════╪═══════╪══════════╡
│ Spark ┆ 22000 ┆ 30days │
│ PySpark ┆ 25000 ┆ 40days │
│ Polars ┆ 30000 ┆ 50days │
│ Pandas ┆ 35000 ┆ 60days │
└─────────┴───────┴──────────┘
Here,
- Only the
CoursesandDurationcolumns are being stripped of whitespace. - The
Feescolumn remains unchanged becausestr.strip_chars()is not applied to it. - The function
pl.col(col).str.strip_chars()targets specific columns by name, so you can choose which ones to strip based on your needs.
Stripping Whitespace from String Columns Using select()
To strip whitespace from all string columns using select() in Polars, apply str.strip_chars() directly to the string columns within the select() method. Keep in mind that using select() creates a new DataFrame with only the transformed columns, rather than modifying the original one in place.
# Use select() to strip whitespace from string columns
df2 = df.select([
pl.col(col).str.strip_chars() if df[col].dtype == pl.Utf8 else pl.col(col)
for col in df.columns
])
print("Stripped DataFrame using select():\n", df2)
# Output:
# Stripped DataFrame using select():
# shape: (4, 3)
┌─────────┬───────┬──────────┐
│ Courses ┆ Fees ┆ Duration │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str │
╞═════════╪═══════╪══════════╡
│ Spark ┆ 22000 ┆ 30days │
│ PySpark ┆ 25000 ┆ 40days │
│ Polars ┆ 30000 ┆ 50days │
│ Pandas ┆ 35000 ┆ 60days │
└─────────┴───────┴──────────┘
Here,
select()builds a new DataFrame with only the specified columns.- You apply
str.strip_chars()to"Courses"and"Duration"."Fees"is included without changes.
Stripping Whitespace from Columns with Mixed Data Types
If you have a Polars DataFrame with a mix of data types (such as strings, integers, and floats) and want to strip whitespace only from the string columns, you should apply the str.strip_chars() method selectively, only to columns where the data type is string.
import polars as pl
# Sample DataFrame with mixed types
data = {
'Course': [' Python ', ' Spark ', ' Polars ', ' Pandas '],
'Fees': [25000, 24000, 32000, 30000],
'Available': [' Yes ', ' No ', ' Yes ', ' No '],
'Rating': [4.8, 4.5, 4.9, 4.7]
}
df = pl.DataFrame(data)
# Strip whitespace only from string (Utf8) columns
df2 = df.with_columns([
pl.col(col).str.strip_chars() if df[col].dtype == pl.Utf8 else pl.col(col)
for col in df.columns
])
print("Cleaned DataFrame (Whitespace stripped from strings only):\n", df2)
# Output:
# Cleaned DataFrame (Whitespace stripped from strings only):
# shape: (4, 4)
┌────────┬───────┬───────────┬────────┐
│ Course ┆ Fees ┆ Available ┆ Rating │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ f64 │
╞════════╪═══════╪═══════════╪════════╡
│ Python ┆ 25000 ┆ Yes ┆ 4.8 │
│ Spark ┆ 24000 ┆ No ┆ 4.5 │
│ Polars ┆ 32000 ┆ Yes ┆ 4.9 │
│ Pandas ┆ 30000 ┆ No ┆ 4.7 │
└────────┴───────┴───────────┴────────┘
Here,
- We loop over all columns in the DataFrame.
- If a column’s data type is
pl.Utf8(string), we apply.str.strip_chars(). - Otherwise, we leave the column as is.
Stripping Whitespace and Creating a New Column
If you want to strip whitespace from one or more string columns and create a new column with the cleaned data in Polars, you can easily do this using with_columns() and applying string operations like str.strip_chars() to your desired column(s).
# Strip whitespace from 'Course' and 'Available' columns, then create new columns
df2 = df.with_columns([
pl.col('Course').str.strip_chars().alias('Cleaned_Course'),
pl.col('Available').str.strip_chars().alias('Cleaned_Available')
])
print("DataFrame with New Columns (Stripped Whitespace):\n", df2)
# Output:
# DataFrame with New Columns (Stripped Whitespace):
# shape: (4, 6)
┌──────────┬───────┬───────────┬────────┬────────────────┬───────────────────┐
│ Course ┆ Fees ┆ Available ┆ Rating ┆ Cleaned_Course ┆ Cleaned_Available │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ f64 ┆ str ┆ str │
╞══════════╪═══════╪═══════════╪════════╪════════════════╪═══════════════════╡
│ Python ┆ 25000 ┆ Yes ┆ 4.8 ┆ Python ┆ Yes │
│ Spark ┆ 24000 ┆ No ┆ 4.5 ┆ Spark ┆ No │
│ Polars ┆ 32000 ┆ Yes ┆ 4.9 ┆ Polars ┆ Yes │
│ Pandas ┆ 30000 ┆ No ┆ 4.7 ┆ Pandas ┆ No │
└──────────┴───────┴───────────┴────────┴────────────────┴───────────────────┘
Here,
- Using
str.strip_chars()on the columns you want to clean (here, ‘Course’ and ‘Available’). - Creating new columns using
.alias()to hold the stripped data, ensuring the original columns are not overwritten. with_columns()is used to apply the transformations and return a new DataFrame with the additional columns.
Conclusion
In conclusion, stripping whitespace from string columns in a Polars DataFrame is a straightforward process that can be accomplished using the str.strip_chars() method. Whether you’re applying it to all string columns, targeting specific ones, or creating new columns with cleaned data, Polars provides flexible ways to handle whitespace removal efficiently. By leveraging with_columns(), select(), and data type checks, you can ensure your DataFrame is properly cleaned and ready for further analysis.
Happy Learning!!
Related Articles
- How to Update the Polars DataFrame
- Mapping a Python Dict to a Polars Series
- Polars Counting Elements in List Column
- Convert Polars Casting a Column to Decimal
- How to Convert Struct to Series in Polars?
- How to Remove Duplicate Columns in Polars?
- Polars Looping Through the Rows in a Dataset
- Conditional Assignment in Polars DataFrame
- Check if any Value in a Polars DataFrame is True
- How to Transform a Series of a Polars DataFrame?
- Add a New Column into an Existing Polars DataFrame
- Removing Null Values on Selected Columns only in Polars DataFrame