In Polars, you can convert a string column to an integer using either the str.to_integer()
or cast()
methods. This is particularly useful when numeric values are stored as text and need to be processed mathematically. Converting string columns to integers is essential when working with datasets where numerical data is incorrectly stored as strings. In this article, I will explain how to convert a string column to an integer type in Polars
Key Points –
- Use
str.to_integer()
to convert a column of string values to integers in Polars. - Apply
with_columns()
to modify the DataFrame and update the column with converted values. - Convert multiple columns using list comprehension inside
with_columns()
. - Handle errors with
strict=False
, which replaces non-numeric values withnull
instead of raising an error. - Missing values (
None
) remainnull
after conversion, ensuring data integrity. - Non-numeric strings (
"abc"
,"NaN"
,"unknown"
) turn intonull
when usingstrict=False
. - Can be combined with
fill_null()
to replacenull
values with a default integer after conversion. - Converting an already numeric column using
str.to_integer()
has no effect, so check types before applying. - For performance optimization, consider using
cast(pl.Int64)
if the column is expected to contain only valid integers.
Usage of Polars String to Integer
Polars provides the str.to_integer()
function to convert string values into integers. This is useful when dealing with numeric data stored as text, such as dataset exports from CSV files.
To run some examples of converting a string column to an integer type, let’s create a Polars DataFrame.
import polars as pl
# Creating a DataFrame with string values
technologies = {
'Courses': ["Spark", "PySpark", "Pandas"],
'Fee': ['22000', '25000', '26000'],
'Discount': ['1000', '2500', '1400']
}
df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)
Yields below output.
You can use the str.to_integer()
method to convert a string column to an integer type. Here’s an example where the Fee
column, currently in string format, is converted to an integer.
# Convert 'Fee' column to integer
df2 = df.with_columns(df["Fee"].str.to_integer().alias("Fee"))
print("Updated DataFrame:\n", df2)
Here,
str.to_integer()
converts a string column to an integer.- We used
with_columns()
to modify the'Fee'
column and store the updated values. - The
Discount
column is still a string because we only transformed.
Converting Multiple Columns using List Comprehension
You can use list comprehension within with_columns() to convert multiple string columns to integers using the str.to_integer()
function. This approach is both clean and efficient when dealing with multiple columns containing numeric strings.
# Convert multiple columns using list comprehension
cols_to_convert = ["Fee", "Discount"]
df2 = df.with_columns([df[col].str.to_integer().alias(col) for col in cols_to_convert])
print("Updated DataFrame:\n", df2)
# Convert multiple columns using list comprehension
columns_to_convert = ["Fee", "Discount"]
df2 = df.with_columns([pl.col(col).str.to_integer() for col in columns_to_convert])
print("Updated DataFrame:\n", df2)
# Output:
# Updated DataFrame:
# shape: (3, 3)
┌─────────┬───────┬──────────┐
│ Courses ┆ Fee ┆ Discount │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════════╪═══════╪══════════╡
│ Spark ┆ 22000 ┆ 1000 │
│ PySpark ┆ 25000 ┆ 2500 │
│ Pandas ┆ 26000 ┆ 1400 │
└─────────┴───────┴──────────┘
Here,
- We define
cols_to_convert=["Fee", "Discount"]
, which holds the column names we want to convert. - We use list comprehension inside
with_columns()
to applystr.to_integer()
to each column. - The updated columns replace the original ones in the DataFrame.
Using cast(pl.Int64) for Numeric Strings
If you want to convert multiple columns of numeric strings to integers, a more efficient approach in Polars is using cast(pl.Int64). This method is preferred when you are sure that the column contains only numeric strings because it’s faster than str.to_integer()
function.
# Convert multiple columns using .cast(pl.Int64)
cols_to_convert = ["Fee", "Discount"]
df2 = df.with_columns([df[col].cast(pl.Int64) for col in cols_to_convert])
print("Updated DataFrame:\n", df2)
# Output:
# Updated DataFrame:
# shape: (3, 3)
┌─────────┬───────┬──────────┐
│ Courses ┆ Fee ┆ Discount │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════════╪═══════╪══════════╡
│ Spark ┆ 22000 ┆ 1000 │
│ PySpark ┆ 25000 ┆ 2500 │
│ Pandas ┆ 26000 ┆ 1400 │
└─────────┴───────┴──────────┘
Here,
- Use
cast(pl.Int64)
when all values are valid numbers (faster thanstr.to_integer()
). - Faster than
str.to_integer()
because it directly converts the column’s data type. - More memory-efficient since Polars stores integers efficiently compared to strings.
- Works best for purely numeric string columns without any non-numeric values.
Convert a String Index Column to Integer
If your index column (stored as a regular column) contains string values representing numbers in polars, you can convert it to an integer type using the str.to_integer()
function.
import polars as pl
# Creating a DataFrame with an index-like column stored as strings
data = {
"Index": ["1", "2", "3", "4"], # Stored as strings
"Courses": ["Spark", "PySpark", "Pandas", "Java"],
"Fee": ["22000", "25000", "26000", "27000"]
}
df = pl.DataFrame(data)
# Convert the 'Index' column to an integer
df2 = df.with_columns(df["Index"].str.to_integer().alias("Index"))
print("Updated DataFrame:\n", df2)
# Output:
# Updated DataFrame:
# shape: (4, 3)
┌───────┬─────────┬───────┐
│ Index ┆ Courses ┆ Fee │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str │
╞═══════╪═════════╪═══════╡
│ 1 ┆ Spark ┆ 22000 │
│ 2 ┆ PySpark ┆ 25000 │
│ 3 ┆ Pandas ┆ 26000 │
│ 4 ┆ Java ┆ 27000 │
└───────┴─────────┴───────┘
Here,
- The
"Index"
column was originally stored as string (str
). - We used
str.to_integer()
to convert it into an integer (i64
). - The column remains part of the DataFrame, not set as an actual index (Polars does not have built-in index support like Pandas).
Converting a Column with Mixed Data (Handling Errors with null)
When a column contains mixed data types (e.g., numbers, text, or missing values), attempting to convert it to integers may cause errors. Polars provides str.to_integer(strict=False)
, which replaces non-numeric values with null
instead of throwing an error.
import polars as pl
# Creating a DataFrame with mixed data types
technologies = {
'Courses': ["Spark", "PySpark", "Pandas", "Java"],
'Fee': ['22000', '25000', 'unknown', '26000'], # 'unknown' is a non-numeric value
'Discount': ['1000', '2500', None, '1400'] # Includes a None (null) value
}
df = pl.DataFrame(technologies)
# Convert 'Fee' column to integer,
# Handling errors by setting non-numeric values to null
df2 = df.with_columns(df["Fee"].str.to_integer(strict=False).alias("Fee"))
print("Updated DataFrame (with null for errors):\n", df2)
# Output:
# Updated DataFrame (with null for errors):
# shape: (4, 3)
┌─────────┬───────┬──────────┐
│ Courses ┆ Fee ┆ Discount │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str │
╞═════════╪═══════╪══════════╡
│ Spark ┆ 22000 ┆ 1000 │
│ PySpark ┆ 25000 ┆ 2500 │
│ Pandas ┆ null ┆ null │
│ Java ┆ 26000 ┆ 1400 │
└─────────┴───────┴──────────┘
Here,
str.to_integer(strict=False)
: Converts numeric strings to integers, replacing invalid values (such as"unknown"
) withnull
instead of raising an error.- Missing values (
None
) remain asnull
in Polars.
Using str.to_integer() with fill_null() to Replace Null Values
When converting a string column with mixed data to integers using str.to_integer(strict=False)
, non-numeric values get replaced with null (None
). If you want to replace these null values with a default value, you can use fill_null()
.
import polars as pl
# Creating a DataFrame with mixed data in the 'Fee' column
data = {
"Courses": ["Spark", "PySpark", "Pandas", "Java"],
"Fee": ["22000", "25000", "unknown", "26000"] # 'unknown' is a non-numeric value
}
df = pl.DataFrame(data)
# Convert 'Fee' column to integer and replace null with 0
df2 = df.with_columns(
df["Fee"].str.to_integer(strict=False).fill_null(0).alias("Fee"))
print("Updated DataFrame (Replacing null with 0):\n", df2)
# Output:
# Updated DataFrame (Replacing null with 0):
# shape: (4, 2)
┌─────────┬───────┐
│ Courses ┆ Fee │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════════╪═══════╡
│ Spark ┆ 22000 │
│ PySpark ┆ 25000 │
│ Pandas ┆ 0 │
│ Java ┆ 26000 │
└─────────┴───────┘
Here,
str.to_integer(strict=False)
: converts numeric strings to integers, replacing non-numeric values withnull
instead of throwing an error.fill_null(0)
: Replaces all null values with 0.
Conclusion
In this article, I have explained how to convert single and multiple columns from string to integer type in a Polars DataFrame using the str.to_integer()
method and the with_columns()
function.
Happy Learning!!
Related Articles
- Select Polars Columns by Index
- Convert Polars Cast Integer to Float
- Convert Polars Cast Float to Integer
- Polars Sum Multiple Columns
- How to Drop Row in Polars
- How to drop a column using Polars
- Add New Columns to Polars DataFrame
- How to Select Columns by Data Type in Polars
- Polars Filter DataFrame with Multilple Conditions
- Append or Concatenate Two DataFrames in Polars