• Post author:
  • Post category:Polars
  • Post last modified:February 14, 2025
  • Reading time:15 mins read
You are currently viewing Polars Cast String to Integer

To convert a string column to an integer type in a Polars DataFrame, you can use the cast() function. This method allows you to change the data type of a column. If you need to convert a column from Utf8 (string) to Int32 or Int64, you can achieve this by applying cast() to the column. This is particularly useful when the column contains numeric values stored as text, and you want to perform mathematical or comparison operations on them. In this article, I will explain how to cast a string to an integer.

Advertisements

Key Points –

  • The cast() function is used to convert columns from one data type to another, including from string to integer.
  • Use pl.Int32, pl.Int64, or other integer types depending on the size of the integer data.
  • The cast() method is used to cast columns to the desired type, e.g., casting a string column to an integer column.
  • You can cast multiple columns to integer types in a single operation using the with_columns() method.
  • Use alias() to create a new column with the desired name after casting.
  • Polars provides an efficient method for casting columns without creating intermediate copies, improving performance.
  • The cast() method supports multiple integer types, including Int32, Int64, Int8, Int16, and more.
  • You can select a column for casting using pl.col('column_name') before applying the cast() function.

Usage of Polars cast string to integer

In Polars, the cast() method is used to convert a column from one data type to another. When you need to cast a string column to an integer, you can use pl.Int32, pl.Int64, or other integer types, depending on the size of the values you’re working with.

To run some examples of converting Polars cast string to integer, let’s create a Polars DataFrame.


import polars as pl

technologies= ({
   'Courses':["Spark","PySpark","Hadoop","Pandas"],
    'Fee' :['22000','25000','24000','26000'],
    'Duration':['30days','50days','40days','60days'],
    'Discount':['1000','2300','2500','1400']})
df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)

Yields below output.

Polars cast string integer

To convert a single column from a string to an integer in Polars, you can use the cast() method on that specific column. Below is an example where the Fee column, which is currently in string format, is cast to an integer.


# Cast 'Fee' column from string to integer (Int32)
df2 = df.with_columns(pl.col('Fee').cast(pl.Int32).alias('Fee_int'))
print("DataFrame with Fee column casted to integer:\n", df2)

Here,

  • pl.col('Fee') – Selects the Fee column.
  • cast(pl.Int32) – Converts the Fee column from a string type to an integer type (Int32).
  • alias('Fee_int') – Creates a new column named Fee_int to store the casted integer values.
Polars cast string integer

Cast Multiple String Columns to Integer

You can apply the cast() method to each column and perform the transformation using with_columns(). In Polars, this approach allows you to cast multiple string columns to integers efficiently. Here’s how you can convert multiple columns, such as Fee and Discount, from strings to integers.


# Cast 'Fee' and 'Discount' columns from string to integer
df2 = df.with_columns([
        pl.col("Fee").cast(pl.Int32).alias("Fee"),
        pl.col("Discount").cast(pl.Int32).alias("Discount")])
print("Updated DataFrame with Fee and Discount as Integer:\n", df2)

# Output:
# Updated DataFrame with Fee and Discount as Integer:
# shape: (4, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fee   ┆ Duration ┆ Discount │
│ ---     ┆ ---   ┆ ---      ┆ ---      │
│ str     ┆ i32   ┆ str      ┆ i32      │
╞═════════╪═══════╪══════════╪══════════╡
│ Spark   ┆ 22000 ┆ 30days   ┆ 1000     │
│ PySpark ┆ 25000 ┆ 50days   ┆ 2300     │
│ Hadoop  ┆ 24000 ┆ 40days   ┆ 2500     │
│ Pandas  ┆ 26000 ┆ 60days   ┆ 1400     │
└─────────┴───────┴──────────┴──────────┘

Here,

  • pl.col("Fee").cast(pl.Int32) – Casts the Fee column from string to integer (Int32).
  • pl.col("Discount").cast(pl.Int32) – Casts the Discount column from string to integer (Int32).
  • with_columns() – Applies both transformations in one step.

Cast String to Integer with pl.Int64

To cast a string column to an integer using pl.Int64 in Polars, you can modify the code to cast the "Fee" and "Discount" columns to Int64 instead of Int32.


# Casting 'Fee' and 'Discount' columns to Int64
df2 = df.with_columns([
    df["Fee"].cast(pl.Int64),
    df["Discount"].cast(pl.Int64)])
print("DataFrame after casting to Int64:\n", df2)

# Output:
# DataFrame after casting to Int64:
# shape: (4, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fee   ┆ Duration ┆ Discount │
│ ---     ┆ ---   ┆ ---      ┆ ---      │
│ str     ┆ i64   ┆ str      ┆ i64      │
╞═════════╪═══════╪══════════╪══════════╡
│ Spark   ┆ 22000 ┆ 30days   ┆ 1000     │
│ PySpark ┆ 25000 ┆ 50days   ┆ 2300     │
│ Hadoop  ┆ 24000 ┆ 40days   ┆ 2500     │
│ Pandas  ┆ 26000 ┆ 60days   ┆ 1400     │
└─────────┴───────┴──────────┴──────────┘

Here,

  • Using pl.Int64 ensures that your numeric columns (like "Fee" and "Discount") are cast to 64-bit integers, which is suitable for large numeric values.
  • The "Courses", and "Duration" columns remain a string as it contains non-numeric data.

Cast and Apply Mathematical Operations

To cast a string column to an integer and then apply various mathematical operations in Polars, you can use the cast() function along with arithmetic operators (+, -, *, /, %, **). Here are a few examples demonstrating how to cast a string column to an integer and apply mathematical operations on it.

Cast String to Integer and Add a Value

To cast a string column to an integer and then add a specific value in Polars, you can use the cast() function along with the addition operator (+).


# Cast to Int64 and add 1000 to each value
df2 = df.with_columns((pl.col("Fee").cast(pl.Int64) + 1000).alias("Fee_plus_1000"))
print(df2)

# Output:
# shape: (4, 5)
┌─────────┬───────┬──────────┬──────────┬───────────────┐
│ Courses ┆ Fee   ┆ Duration ┆ Discount ┆ Fee_plus_1000 │
│ ---     ┆ ---   ┆ ---      ┆ ---      ┆ ---           │
│ str     ┆ str   ┆ str      ┆ str      ┆ i64           │
╞═════════╪═══════╪══════════╪══════════╪═══════════════╡
│ Spark   ┆ 22000 ┆ 30days   ┆ 1000     ┆ 23000         │
│ PySpark ┆ 25000 ┆ 50days   ┆ 2300     ┆ 26000         │
│ Hadoop  ┆ 24000 ┆ 40days   ┆ 2500     ┆ 25000         │
│ Pandas  ┆ 26000 ┆ 60days   ┆ 1400     ┆ 27000         │
└─────────┴───────┴──────────┴──────────┴───────────────┘

Here,

  • pl.col("Fee").cast(pl.Int64) – This casts the 'Fee' column from string to integers (Int64).
  • + 1000 – Adds 1000 to each value in the 'Fee' column.
  • alias("Fee_plus_1000") – Names the new column 'Fee_plus_1000'.

Cast String to Integer and Apply Exponentiation

To cast a string column to an integer and then apply exponentiation in Polars, you can use the cast() function along with the ** operator (or pow() function).


# Cast 'Fee' column to Int64 and apply exponentiation (square each value)
df2 = df.with_columns(
    (pl.col("Fee").cast(pl.Int64) ** 2).alias("Squared_Value"))
print(df2)

# Output:
# shape: (4, 5)
┌─────────┬───────┬──────────┬──────────┬───────────────┐
│ Courses ┆ Fee   ┆ Duration ┆ Discount ┆ Squared_Value │
│ ---     ┆ ---   ┆ ---      ┆ ---      ┆ ---           │
│ str     ┆ str   ┆ str      ┆ str      ┆ i64           │
╞═════════╪═══════╪══════════╪══════════╪═══════════════╡
│ Spark   ┆ 22000 ┆ 30days   ┆ 1000     ┆ 484000000     │
│ PySpark ┆ 25000 ┆ 50days   ┆ 2300     ┆ 625000000     │
│ Hadoop  ┆ 24000 ┆ 40days   ┆ 2500     ┆ 576000000     │
│ Pandas  ┆ 26000 ┆ 60days   ┆ 1400     ┆ 676000000     │
└─────────┴───────┴──────────┴──────────┴───────────────┘

Here,

  • pl.col("Fee").cast(pl.Int64) – Casts the 'Fee' column from string to integer (Int64).
  • ** 2 – Applies exponentiation (squares each value in this case).
  • alias("Squared_Value") – Renames the new column to 'Squared_Value'.

Cast String to Integer and Find the Modulus

To cast a string column to an integer and then find the modulus (remainder after division) in Polars, you can use the cast() function along with the modulus operator (%).


# Cast 'Fee' column to Int64 and find modulus when divided by 7000
df = df.with_columns((pl.col("Fee").cast(pl.Int64) % 7000).alias("Fee_Modulus_7000"))
print(df)

# Output:
# shape: (4, 5)
┌─────────┬───────┬──────────┬──────────┬──────────────────┐
│ Courses ┆ Fee   ┆ Duration ┆ Discount ┆ Fee_Modulus_7000 │
│ ---     ┆ ---   ┆ ---      ┆ ---      ┆ ---              │
│ str     ┆ str   ┆ str      ┆ str      ┆ i64              │
╞═════════╪═══════╪══════════╪══════════╪══════════════════╡
│ Spark   ┆ 22000 ┆ 30days   ┆ 1000     ┆ 1000             │
│ PySpark ┆ 25000 ┆ 50days   ┆ 2300     ┆ 4000             │
│ Hadoop  ┆ 24000 ┆ 40days   ┆ 2500     ┆ 3000             │
│ Pandas  ┆ 26000 ┆ 60days   ┆ 1400     ┆ 5000             │
└─────────┴───────┴──────────┴──────────┴──────────────────┘

Here.

  • pl.col("Fee").cast(pl.Int64) – Converts the 'Fee' column from string to integer (Int64).
  • % 7000 – Finds the remainder when each value is divided by 7000.
  • alias("Fee_Modulus_7000") – Renames the resulting column to 'Fee_Modulus_7000'.

Cast String Column and Rename

You can use the cast() function to change the column type and the alias() method to assign a new name to the converted column. This is how you cast a string column to an integer and rename it in Polars.


# Cast 'Fee' column to Int64 and rename it to 'Fee_in_int'
df2 = df.with_columns(pl.col("Fee").cast(pl.Int64).alias("Fee_in_int"))
print(df2)

# Output:
# shape: (4, 5)
┌─────────┬───────┬──────────┬──────────┬────────────┐
│ Courses ┆ Fee   ┆ Duration ┆ Discount ┆ Fee_in_int │
│ ---     ┆ ---   ┆ ---      ┆ ---      ┆ ---        │
│ str     ┆ str   ┆ str      ┆ str      ┆ i64        │
╞═════════╪═══════╪══════════╪══════════╪════════════╡
│ Spark   ┆ 22000 ┆ 30days   ┆ 1000     ┆ 22000      │
│ PySpark ┆ 25000 ┆ 50days   ┆ 2300     ┆ 25000      │
│ Hadoop  ┆ 24000 ┆ 40days   ┆ 2500     ┆ 24000      │
│ Pandas  ┆ 26000 ┆ 60days   ┆ 1400     ┆ 26000      │
└─────────┴───────┴──────────┴──────────┴────────────┘

Here,

  • pl.col("Fee").cast(pl.Int64) – Casts the 'Fee' column from string to integers (Int64).
  • alias("Fee_in_int") – Renames the resulting column to 'Fee_in_int'.

Cast String to Integer and Handle Missing Values

When casting a string column to an integer in Polars, missing or invalid values (such as empty strings or non-numeric values) can cause errors. To handle such cases, you can use pl.col().str.to_integer() or apply fill_null() to replace missing values.


import polars as pl

# Sample DataFrame with missing and invalid string values
technologies = {
    'Courses': ["Spark", "PySpark", "Hadoop", "Pandas"],
    'Fee': ['22000', '25000', None, '26000'],  # Contains a missing value (None)
    'Discount': ['1000', '2300', 'invalid', '1400']  # Contains an invalid string ("invalid")}
df = pl.DataFrame(technologies)

# Cast 'Fee' and 'Discount' columns to integer, handling missing/invalid values
df2 = df.with_columns([
    pl.col('Fee').cast(pl.Int32, strict=False).fill_null(0).alias('Fee_int'),
    pl.col('Discount').cast(pl.Int32, strict=False).fill_null(0).alias('Discount_int')])
print(df2)

# Output:
# shape: (4, 5)
┌─────────┬───────┬──────────┬─────────┬──────────────┐
│ Courses ┆ Fee   ┆ Discount ┆ Fee_int ┆ Discount_int │
│ ---     ┆ ---   ┆ ---      ┆ ---     ┆ ---          │
│ str     ┆ str   ┆ str      ┆ i32     ┆ i32          │
╞═════════╪═══════╪══════════╪═════════╪══════════════╡
│ Spark   ┆ 22000 ┆ 1000     ┆ 22000   ┆ 1000         │
│ PySpark ┆ 25000 ┆ 2300     ┆ 25000   ┆ 2300         │
│ Hadoop  ┆ null  ┆ invalid  ┆ 0       ┆ 0            │
│ Pandas  ┆ 26000 ┆ 1400     ┆ 26000   ┆ 1400         │
└─────────┴───────┴──────────┴─────────┴──────────────┘

Conclusion

In summary, converting a string column to an integer type in Polars is easy with the cast() method. This transformation is essential for data preparation, particularly when numbers are stored as strings. By using cast(pl.Int32) or other integer types like Int64, you ensure the column is properly formatted for numerical computations or further data manipulation.

Happy Learning!!

References