In Polars, you can cast a column from a string type to a float type using the cast()
function. This is particularly useful when your data is stored as strings (e.g., numbers in string format) and you need to perform numeric operations or analyses. Converting these string values to a numeric type like Float64
enables you to carry out calculations and further analysis. In this article, I will explain how to convert polars cast string to float.
Key Points –
- The
cast()
function is used to convert a string column to a float column in Polars. - You can cast to
Float32
orFloat64
depending on the precision needed. - Polars does not automatically convert strings to numeric types; explicit casting is required.
- Casting large datasets from string to float may have performance implications; choose the appropriate float type (
Float32
vs.Float64
). - You can use
.alias()
to rename the column after casting, or keep the same name. - If casting a date string, you must first parse the date and then convert to a float (e.g., epoch seconds).
- You can chain the
cast()
function with other operations like filtering or selecting columns in a single step.
Usage of Polars Cast String to Float
To convert a column from string to float in Polars, you can use the cast()
method to change the data type of the column to Float64
. This is especially useful when you have numeric values stored as strings and want to perform numerical operations.
To run some examples of converting polars cast string to float, let’s create a Polars DataFrame.
import polars as pl
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :["20000","25000","22000", "30000"],
'Discount':["1000","2300","1200","2000"]
}
df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)
Yields below output.
To perform a basic conversion of string values to float in the Fee
and Discount
columns of the DataFrame, you can use the cast() function to convert those columns from string to float.
# Convert Fee and Discount columns to float
df2 = df.with_columns([
pl.col("Fee").cast(pl.Float64).alias("Fee"),
pl.col("Discount").cast(pl.Float64).alias("Discount")
])
print("DataFrame after conversion:\n", df2)
Here,
cast(pl.Float64
)
: This converts the column data type to 64-bit float.alias()
: Renames the column after conversion, though in this case, it’s keeping the original column names.
Casting a Single Column
To cast a single column in a Polars DataFrame, you can use the cast()
function specifically on that column and then update the DataFrame using with_columns
.
# Cast the 'Fee' column to float
df2 = df.with_columns([
pl.col("Fee").cast(pl.Float64).alias("Fee")
])
print("DataFrame after casting a single column:\n", df2)
# Output:
# DataFrame after casting a single column:
# shape: (4, 3)
┌─────────┬─────────┬──────────┐
│ Courses ┆ Fee ┆ Discount │
│ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ str │
╞═════════╪═════════╪══════════╡
│ Spark ┆ 20000.0 ┆ 1000 │
│ PySpark ┆ 25000.0 ┆ 2300 │
│ Python ┆ 22000.0 ┆ 1200 │
│ pandas ┆ 30000.0 ┆ 2000 │
└─────────┴─────────┴──────────┘
Here,
pl.col("Fee").cast(pl.Float64)
: Thecast()
function is applied only to theFee
column, converting it from string toFloat64
type..alias("Fee")
: This ensures the column is renamed back to"Fee"
after the conversion. It’s optional if you want to retain the original name.
Casting Multiple Columns
You can apply the cast()
function to each column individually to cast multiple columns in a Polars DataFrame, and then use with_columns()
to update the DataFrame
# Cast 'Fee' and 'Discount' columns to Float64
df2 = df.with_columns([
df['Fee'].cast(pl.Float64).alias('Fee'),
df['Discount'].cast(pl.Float64).alias('Discount')
])
print("DataFrame after casting 'Fee' and 'Discount' to Float64:\n", df2)
# Output:
# DataFrame after casting 'Fee' and 'Discount' to Float64:
# shape: (4, 3)
┌─────────┬─────────┬──────────┐
│ Courses ┆ Fee ┆ Discount │
│ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ f64 │
╞═════════╪═════════╪══════════╡
│ Spark ┆ 20000.0 ┆ 1000.0 │
│ PySpark ┆ 25000.0 ┆ 2300.0 │
│ Python ┆ 22000.0 ┆ 1200.0 │
│ pandas ┆ 30000.0 ┆ 2000.0 │
└─────────┴─────────┴──────────┘
In the above example, both the Fee
and Discount
columns are cast from strings (str
) to floats (f64
). The with_columns()
method updates both columns in one step.
Converting a Subset of Columns
To convert a subset of columns from string to float in a Polars DataFrame, you can specify the columns you want to convert and apply the cast()
method only to those columns.
# Convert only the 'Fee' and 'Discount' columns to float
df2 = df.with_columns([
pl.col("Fee").cast(pl.Float64).alias("Fee_float"),
pl.col("Discount").cast(pl.Float64).alias("Discount_float")
])
print("DataFrame After Conversion (Subset of Columns):\n", df2)
# Output:
# DataFrame After Conversion (Subset of Columns):
# shape: (4, 5)
┌─────────┬───────┬──────────┬───────────┬────────────────┐
│ Courses ┆ Fee ┆ Discount ┆ Fee_float ┆ Discount_float │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ f64 ┆ f64 │
╞═════════╪═══════╪══════════╪═══════════╪════════════════╡
│ Spark ┆ 20000 ┆ 1000 ┆ 20000.0 ┆ 1000.0 │
│ PySpark ┆ 25000 ┆ 2300 ┆ 25000.0 ┆ 2300.0 │
│ Python ┆ 22000 ┆ 1200 ┆ 22000.0 ┆ 1200.0 │
│ pandas ┆ 30000 ┆ 2000 ┆ 30000.0 ┆ 2000.0 │
└─────────┴───────┴──────────┴───────────┴────────────────┘
Here,
- Only the
Fee
andDiscount
columns are converted to float, while theCourses
column remains unchanged. - Use
pl.col(column_name)
to specify the columns you want to convert.alias()
is used to create new columns for the converted data. - This method avoids unnecessary type conversion for other columns, making the transformation more efficient.
Conversion with None Values
When converting a string column to a float in Polars, any invalid or non-numeric string values will be converted to None
(null) in the resulting column.
import polars as pl
# Sample DataFrame with strings and None values
data = {
'Courses': ["Spark", "PySpark", "Python", "pandas"],
'Fee': ["20000", "25000", None, "30000"], # None in 'Fee'
'Discount': ["1000", "2300", None, "2000"] # None in 'Discount'
}
df = pl.DataFrame(data)
# Casting 'Fee' and 'Discount'
# To float while handling None values
df = df.with_columns([
pl.col("Fee").cast(pl.Float64),
pl.col("Discount").cast(pl.Float64)
])
print("DataFrame After Casting:\n", df)
Here,
- When casting a column with
None
values to float, Polars automatically convertsNone
tonull
in the resulting DataFrame. No error is raised during this process. - After casting, the column’s type changes from
str
tof64
. - Valid numeric strings are accurately converted to floats, while invalid or missing values (
None
) are retained asnull
.
Conclusion
In conclusion, the cast()
function in Polars provides a simple yet powerful way to convert string columns to float, enabling seamless numerical operations and analysis. It effectively handles diverse scenarios, such as scientific notations, missing or invalid values, and multiple-column conversions.
Happy Learning!!
Related Articles
- Polars DataFrame.rename() Method
- Polars DataFrame.sort() Method
- Polars DataFrame.melt() Method
- Polars Cast Multiple Columns
- Polars DataFrame.unique() Function
- Polars DataFrame.explode() Method
- Polars DataFrame.filter() Usage & Examples
- Polars DataFrame.join() Explained With Examples
- Polars DataFrame.pivot() Explained with Examples
- Polars DataFrame.groupby() Explained With Examples