• Post author:
  • Post category:Polars
  • Post last modified:March 14, 2025
  • Reading time:13 mins read
You are currently viewing Convert Polars String to Integer

In Polars, you can convert a string column to an integer using either the str.to_integer() or cast() methods. This is particularly useful when numeric values are stored as text and need to be processed mathematically. Converting string columns to integers is essential when working with datasets where numerical data is incorrectly stored as strings. In this article, I will explain how to convert a string column to an integer type in Polars

Advertisements

Key Points –

  • Use str.to_integer() to convert a column of string values to integers in Polars.
  • Apply with_columns() to modify the DataFrame and update the column with converted values.
  • Convert multiple columns using list comprehension inside with_columns().
  • Handle errors with strict=False, which replaces non-numeric values with null instead of raising an error.
  • Missing values (None) remain null after conversion, ensuring data integrity.
  • Non-numeric strings ("abc", "NaN", "unknown") turn into null when using strict=False.
  • Can be combined with fill_null() to replace null values with a default integer after conversion.
  • Converting an already numeric column using str.to_integer() has no effect, so check types before applying.
  • For performance optimization, consider using cast(pl.Int64) if the column is expected to contain only valid integers.

Usage of Polars String to Integer

Polars provides the str.to_integer() function to convert string values into integers. This is useful when dealing with numeric data stored as text, such as dataset exports from CSV files.

To run some examples of converting a string column to an integer type, let’s create a Polars DataFrame.


import polars as pl

# Creating a DataFrame with string values
technologies = {
   'Courses': ["Spark", "PySpark", "Pandas"],
   'Fee': ['22000', '25000', '26000'], 
   'Discount': ['1000', '2500', '1400'] 
}

df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)

Yields below output.

polars convert string integer

You can use the str.to_integer() method to convert a string column to an integer type. Here’s an example where the Fee column, currently in string format, is converted to an integer.


# Convert 'Fee' column to integer
df2 = df.with_columns(df["Fee"].str.to_integer().alias("Fee"))
print("Updated DataFrame:\n", df2)

Here,

  • str.to_integer() converts a string column to an integer.
  • We used with_columns() to modify the 'Fee' column and store the updated values.
  • The Discount column is still a string because we only transformed.
polars convert string integer

Converting Multiple Columns using List Comprehension

You can use list comprehension within with_columns() to convert multiple string columns to integers using the str.to_integer() function. This approach is both clean and efficient when dealing with multiple columns containing numeric strings.


# Convert multiple columns using list comprehension
cols_to_convert = ["Fee", "Discount"]
df2 = df.with_columns([df[col].str.to_integer().alias(col) for col in cols_to_convert])
print("Updated DataFrame:\n", df2)

# Convert multiple columns using list comprehension
columns_to_convert = ["Fee", "Discount"]
df2 = df.with_columns([pl.col(col).str.to_integer() for col in columns_to_convert])
print("Updated DataFrame:\n", df2)

# Output:
# Updated DataFrame:
# shape: (3, 3)
┌─────────┬───────┬──────────┐
│ Courses ┆ Fee   ┆ Discount │
│ ---     ┆ ---   ┆ ---      │
│ str     ┆ i64   ┆ i64      │
╞═════════╪═══════╪══════════╡
│ Spark   ┆ 22000 ┆ 1000     │
│ PySpark ┆ 25000 ┆ 2500     │
│ Pandas  ┆ 26000 ┆ 1400     │
└─────────┴───────┴──────────┘

Here,

  • We define cols_to_convert=["Fee", "Discount"], which holds the column names we want to convert.
  • We use list comprehension inside with_columns() to apply str.to_integer() to each column.
  • The updated columns replace the original ones in the DataFrame.

Using cast(pl.Int64) for Numeric Strings

If you want to convert multiple columns of numeric strings to integers, a more efficient approach in Polars is using cast(pl.Int64). This method is preferred when you are sure that the column contains only numeric strings because it’s faster than str.to_integer() function.


# Convert multiple columns using .cast(pl.Int64)
cols_to_convert = ["Fee", "Discount"]
df2 = df.with_columns([df[col].cast(pl.Int64) for col in cols_to_convert])
print("Updated DataFrame:\n", df2)

# Output:
# Updated DataFrame:
# shape: (3, 3)
┌─────────┬───────┬──────────┐
│ Courses ┆ Fee   ┆ Discount │
│ ---     ┆ ---   ┆ ---      │
│ str     ┆ i64   ┆ i64      │
╞═════════╪═══════╪══════════╡
│ Spark   ┆ 22000 ┆ 1000     │
│ PySpark ┆ 25000 ┆ 2500     │
│ Pandas  ┆ 26000 ┆ 1400     │
└─────────┴───────┴──────────┘

Here,

  • Use cast(pl.Int64) when all values are valid numbers (faster than str.to_integer()).
  • Faster than str.to_integer() because it directly converts the column’s data type.
  • More memory-efficient since Polars stores integers efficiently compared to strings.
  • Works best for purely numeric string columns without any non-numeric values.

Convert a String Index Column to Integer

If your index column (stored as a regular column) contains string values representing numbers in polars, you can convert it to an integer type using the str.to_integer() function.


import polars as pl

# Creating a DataFrame with an index-like column stored as strings
data = {
    "Index": ["1", "2", "3", "4"],  # Stored as strings
    "Courses": ["Spark", "PySpark", "Pandas", "Java"],
    "Fee": ["22000", "25000", "26000", "27000"]
}

df = pl.DataFrame(data)

# Convert the 'Index' column to an integer
df2 = df.with_columns(df["Index"].str.to_integer().alias("Index"))
print("Updated DataFrame:\n", df2)

# Output:
# Updated DataFrame:
# shape: (4, 3)
┌───────┬─────────┬───────┐
│ Index ┆ Courses ┆ Fee   │
│ ---   ┆ ---     ┆ ---   │
│ i64   ┆ str     ┆ str   │
╞═══════╪═════════╪═══════╡
│ 1     ┆ Spark   ┆ 22000 │
│ 2     ┆ PySpark ┆ 25000 │
│ 3     ┆ Pandas  ┆ 26000 │
│ 4     ┆ Java    ┆ 27000 │
└───────┴─────────┴───────┘

Here,

  • The "Index" column was originally stored as string (str).
  • We used str.to_integer() to convert it into an integer (i64).
  • The column remains part of the DataFrame, not set as an actual index (Polars does not have built-in index support like Pandas).

Converting a Column with Mixed Data (Handling Errors with null)

When a column contains mixed data types (e.g., numbers, text, or missing values), attempting to convert it to integers may cause errors. Polars provides str.to_integer(strict=False), which replaces non-numeric values with null instead of throwing an error.


import polars as pl

# Creating a DataFrame with mixed data types
technologies = {
   'Courses': ["Spark", "PySpark", "Pandas", "Java"],
   'Fee': ['22000', '25000', 'unknown', '26000'],  # 'unknown' is a non-numeric value
   'Discount': ['1000', '2500', None, '1400']  # Includes a None (null) value
}

df = pl.DataFrame(technologies)

# Convert 'Fee' column to integer, 
# Handling errors by setting non-numeric values to null
df2 = df.with_columns(df["Fee"].str.to_integer(strict=False).alias("Fee"))
print("Updated DataFrame (with null for errors):\n", df2)

# Output:
# Updated DataFrame (with null for errors):
# shape: (4, 3)
┌─────────┬───────┬──────────┐
│ Courses ┆ Fee   ┆ Discount │
│ ---     ┆ ---   ┆ ---      │
│ str     ┆ i64   ┆ str      │
╞═════════╪═══════╪══════════╡
│ Spark   ┆ 22000 ┆ 1000     │
│ PySpark ┆ 25000 ┆ 2500     │
│ Pandas  ┆ null  ┆ null     │
│ Java    ┆ 26000 ┆ 1400     │
└─────────┴───────┴──────────┘

Here,

  • str.to_integer(strict=False): Converts numeric strings to integers, replacing invalid values (such as "unknown") with null instead of raising an error.
  • Missing values (None) remain as null in Polars.

Using str.to_integer() with fill_null() to Replace Null Values

When converting a string column with mixed data to integers using str.to_integer(strict=False), non-numeric values get replaced with null (None). If you want to replace these null values with a default value, you can use fill_null().


import polars as pl

# Creating a DataFrame with mixed data in the 'Fee' column
data = {
    "Courses": ["Spark", "PySpark", "Pandas", "Java"],
    "Fee": ["22000", "25000", "unknown", "26000"]  # 'unknown' is a non-numeric value
}

df = pl.DataFrame(data)

# Convert 'Fee' column to integer and replace null with 0
df2 = df.with_columns(
    df["Fee"].str.to_integer(strict=False).fill_null(0).alias("Fee"))
print("Updated DataFrame (Replacing null with 0):\n", df2)

# Output:
# Updated DataFrame (Replacing null with 0):
# shape: (4, 2)
┌─────────┬───────┐
│ Courses ┆ Fee   │
│ ---     ┆ ---   │
│ str     ┆ i64   │
╞═════════╪═══════╡
│ Spark   ┆ 22000 │
│ PySpark ┆ 25000 │
│ Pandas  ┆ 0     │
│ Java    ┆ 26000 │
└─────────┴───────┘

Here,

  • str.to_integer(strict=False): converts numeric strings to integers, replacing non-numeric values with null instead of throwing an error.
  • fill_null(0): Replaces all null values with 0.

Conclusion

In this article, I have explained how to convert single and multiple columns from string to integer type in a Polars DataFrame using the str.to_integer() method and the with_columns() function.

Happy Learning!!

References