• Post author:
  • Post category:Polars
  • Post last modified:January 10, 2025
  • Reading time:11 mins read

In Polars, you can cast a column from a string type to a float type using the cast() function. This is particularly useful when your data is stored as strings (e.g., numbers in string format) and you need to perform numeric operations or analyses. Converting these string values to a numeric type like Float64 enables you to carry out calculations and further analysis. In this article, I will explain how to convert polars cast string to float.

Advertisements

Key Points –

  • The cast() function is used to convert a string column to a float column in Polars.
  • You can cast to Float32 or Float64 depending on the precision needed.
  • Polars does not automatically convert strings to numeric types; explicit casting is required.
  • Casting large datasets from string to float may have performance implications; choose the appropriate float type (Float32 vs. Float64).
  • You can use .alias() to rename the column after casting, or keep the same name.
  • If casting a date string, you must first parse the date and then convert to a float (e.g., epoch seconds).
  • You can chain the cast() function with other operations like filtering or selecting columns in a single step.

Usage of Polars Cast String to Float

To convert a column from string to float in Polars, you can use the cast() method to change the data type of the column to Float64. This is especially useful when you have numeric values stored as strings and want to perform numerical operations.

To run some examples of converting polars cast string to float, let’s create a Polars DataFrame.


import polars as pl

technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :["20000","25000","22000", "30000"],
    'Discount':["1000","2300","1200","2000"]
              }
df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)

Yields below output.

Polars cast list string

To perform a basic conversion of string values to float in the Fee and Discount columns of the DataFrame, you can use the cast() function to convert those columns from string to float.


# Convert Fee and Discount columns to float
df2 = df.with_columns([
    pl.col("Fee").cast(pl.Float64).alias("Fee"),
    pl.col("Discount").cast(pl.Float64).alias("Discount")
])
print("DataFrame after conversion:\n", df2)

Here,

  • cast(pl.Float64): This converts the column data type to 64-bit float.
  • alias(): Renames the column after conversion, though in this case, it’s keeping the original column names.
Polars cast list string

Casting a Single Column

To cast a single column in a Polars DataFrame, you can use the cast() function specifically on that column and then update the DataFrame using with_columns.


# Cast the 'Fee' column to float
df2 = df.with_columns([
    pl.col("Fee").cast(pl.Float64).alias("Fee")
])
print("DataFrame after casting a single column:\n", df2)

# Output:
# DataFrame after casting a single column:
# shape: (4, 3)
┌─────────┬─────────┬──────────┐
│ Courses ┆ Fee     ┆ Discount │
│ ---     ┆ ---     ┆ ---      │
│ str     ┆ f64     ┆ str      │
╞═════════╪═════════╪══════════╡
│ Spark   ┆ 20000.0 ┆ 1000     │
│ PySpark ┆ 25000.0 ┆ 2300     │
│ Python  ┆ 22000.0 ┆ 1200     │
│ pandas  ┆ 30000.0 ┆ 2000     │
└─────────┴─────────┴──────────┘

Here,

  • pl.col("Fee").cast(pl.Float64): The cast() function is applied only to the Fee column, converting it from string to Float64 type.
  • .alias("Fee"): This ensures the column is renamed back to "Fee" after the conversion. It’s optional if you want to retain the original name.

Casting Multiple Columns

You can apply the cast() function to each column individually to cast multiple columns in a Polars DataFrame, and then use with_columns() to update the DataFrame


# Cast 'Fee' and 'Discount' columns to Float64
df2 = df.with_columns([
    df['Fee'].cast(pl.Float64).alias('Fee'),
    df['Discount'].cast(pl.Float64).alias('Discount')
])
print("DataFrame after casting 'Fee' and 'Discount' to Float64:\n", df2)

# Output:
# DataFrame after casting 'Fee' and 'Discount' to Float64:
# shape: (4, 3)
┌─────────┬─────────┬──────────┐
│ Courses ┆ Fee     ┆ Discount │
│ ---     ┆ ---     ┆ ---      │
│ str     ┆ f64     ┆ f64      │
╞═════════╪═════════╪══════════╡
│ Spark   ┆ 20000.0 ┆ 1000.0   │
│ PySpark ┆ 25000.0 ┆ 2300.0   │
│ Python  ┆ 22000.0 ┆ 1200.0   │
│ pandas  ┆ 30000.0 ┆ 2000.0   │
└─────────┴─────────┴──────────┘

In the above example, both the Fee and Discount columns are cast from strings (str) to floats (f64). The with_columns() method updates both columns in one step.

Converting a Subset of Columns

To convert a subset of columns from string to float in a Polars DataFrame, you can specify the columns you want to convert and apply the cast() method only to those columns.


# Convert only the 'Fee' and 'Discount' columns to float
df2 = df.with_columns([
    pl.col("Fee").cast(pl.Float64).alias("Fee_float"),
    pl.col("Discount").cast(pl.Float64).alias("Discount_float")
])

print("DataFrame After Conversion (Subset of Columns):\n", df2)

# Output:
# DataFrame After Conversion (Subset of Columns):
# shape: (4, 5)
┌─────────┬───────┬──────────┬───────────┬────────────────┐
│ Courses ┆ Fee   ┆ Discount ┆ Fee_float ┆ Discount_float │
│ ---     ┆ ---   ┆ ---      ┆ ---       ┆ ---            │
│ str     ┆ str   ┆ str      ┆ f64       ┆ f64            │
╞═════════╪═══════╪══════════╪═══════════╪════════════════╡
│ Spark   ┆ 20000 ┆ 1000     ┆ 20000.0   ┆ 1000.0         │
│ PySpark ┆ 25000 ┆ 2300     ┆ 25000.0   ┆ 2300.0         │
│ Python  ┆ 22000 ┆ 1200     ┆ 22000.0   ┆ 1200.0         │
│ pandas  ┆ 30000 ┆ 2000     ┆ 30000.0   ┆ 2000.0         │
└─────────┴───────┴──────────┴───────────┴────────────────┘

Here,

  • Only the Fee and Discount columns are converted to float, while the Courses column remains unchanged.
  • Use pl.col(column_name) to specify the columns you want to convert. alias() is used to create new columns for the converted data.
  • This method avoids unnecessary type conversion for other columns, making the transformation more efficient.

Conversion with None Values

When converting a string column to a float in Polars, any invalid or non-numeric string values will be converted to None (null) in the resulting column.


import polars as pl

# Sample DataFrame with strings and None values
data = {
    'Courses': ["Spark", "PySpark", "Python", "pandas"],
    'Fee': ["20000", "25000", None, "30000"],   # None in 'Fee'
    'Discount': ["1000", "2300", None, "2000"]  # None in 'Discount'
}

df = pl.DataFrame(data)

# Casting 'Fee' and 'Discount' 
# To float while handling None values
df = df.with_columns([
    pl.col("Fee").cast(pl.Float64),      
    pl.col("Discount").cast(pl.Float64)
])
print("DataFrame After Casting:\n", df)

Here,

  • When casting a column with None values to float, Polars automatically converts None to null in the resulting DataFrame. No error is raised during this process.
  • After casting, the column’s type changes from str to f64.
  • Valid numeric strings are accurately converted to floats, while invalid or missing values (None) are retained as null.

Conclusion

In conclusion, the cast() function in Polars provides a simple yet powerful way to convert string columns to float, enabling seamless numerical operations and analysis. It effectively handles diverse scenarios, such as scientific notations, missing or invalid values, and multiple-column conversions.

Happy Learning!!

References