In Polars, the cast()
method is used to change the data type of one or more columns in a DataFrame. It is useful when you need to convert columns to a specific type for analysis or to improve memory efficiency. This method can be applied to individual columns or multiple columns at once, supporting a range of data types, including Int64
, Float32
, Utf8
, and more.
In this article, I will explain the cast()
function and using its syntax, parameters, and usage how to return a new DataFrame with the specified data types applied to the columns.
Key Points –
- The
cast()
method in Polars is used to convert the data type of one or more columns in a DataFrame. - You can cast multiple columns at once by passing a dictionary or a list of column transformations.
- The method allows you to target specific columns using column selectors like
pl.col()
. - You can cast columns to various data types, including
Int64
,Float32
,Utf8
(string),Boolean
,Date
, andDatetime
. - Casting can result in data loss or truncation, especially when converting from a higher precision type (e.g.,
Float64
toInt32
). - Casting columns to more suitable types can optimize memory usage and computational efficiency during analysis.
- The
cast()
method can be used to convert string columns representing dates intoDatetime
orDate
types for easier date manipulation. - The
cast()
method is versatile and supports casting a single column, multiple columns, or even all columns of a DataFrame at once.
Polars DataFrame.cast() Syntax
Following is a syntax of the DataFrame.cast()
. This function takes dtypes
, and strict
params.
# Syntax of cast()
DataFrame.cast(
dtypes: Mapping[ColumnNameOrSelector | PolarsDataType, PolarsDataType | PythonDataType] | PolarsDataType,
*,
strict: bool = True,
) → DataFrame
Parameters of the Polars DataFrame.cast()
Following are the parameters of cast()
method.
dtypes
– Specifies the desired data types for the columns.- A single data type (
PolarsDataType
orPythonDataType
), which will be applied to all columns in the DataFrame (e.g.,pl.Int64
). - A dictionary where the keys are column names (or column selectors) and the values are the target data types for the respective columns. The values can be either
PolarsDataType
orPythonDataType
(e.g.,pl.Int32
,pl.Float64
).
- A single data type (
strict
– (optional, defaultTrue
).- When
strict=True
, the method raises an error if the type conversion is not possible (e.g., trying to convert a string with non-numeric characters to a numeric type). - When
strict=False
, the method will attempt to convert columns even if some values cannot be converted, and non-convertible values will be turned intonull
.
- When
Return Value
This function returns a new DataFrame with the specified data types applied to the columns.
Usage of Polars DataFrame.cast() Method
The cast()
method in Polars is used to convert the data type of one or more columns in a DataFrame to a specified data type. This method is helpful for ensuring that the data types are appropriate for the analysis or operations you want to perform.
To run some examples of the Polars DataFrame.cast() method, let’s create a Polars DataFrame.
import polars as pl
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000, 30000],
'Duration':['30days','40days','35days','50days'],
'Discount':[1000,2300,1200,2000]
}
df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)
Yields below output.
To cast a single column to a different data type in Polars, you can pass a dictionary with the column name and the target data type to the cast()
method.
# Casting the 'Fee' column to Float32
df2 = df.cast({"Fee": pl.Float32})
print(df2)
Here,
- The
Fee
column, originally of typeInt64
, is cast toFloat32
. - The result is a DataFrame where the
Fee
column contains floating-point numbers.
Convert Multiple Columns to Specific Types
To convert multiple columns to specific types in Polars, you can use the cast()
method with a mapping that specifies the target data type for each column.
# Casting 'Fee' to Float32 and 'Duration' to Utf8 (string)
df2 = df.cast({"Fee": pl.Float32, "Duration": pl.Utf8})
print(df2)
# Output:
# shape: (4, 4)
┌─────────┬─────────┬──────────┬──────────┐
│ Courses ┆ Fee ┆ Duration ┆ Discount │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ f32 ┆ str ┆ i64 │
╞═════════╪═════════╪══════════╪══════════╡
│ Spark ┆ 20000.0 ┆ 30days ┆ 1000 │
│ PySpark ┆ 25000.0 ┆ 40days ┆ 2300 │
│ Python ┆ 22000.0 ┆ 35days ┆ 1200 │
│ pandas ┆ 30000.0 ┆ 50days ┆ 2000 │
└─────────┴─────────┴──────────┴──────────┘
Here,
- The
"Fee"
column is converted toFloat32
to ensure floating-point precision, while the"Duration"
column is explicitly cast toUtf8
(string) to confirm its data type, even though it was already a string. - Columns not specified in the mapping (e.g., “Courses” and “Discount”) retain their original data types.
- You can cast as many columns as needed by adding more key-value pairs to the mapping.
Alternatively, to cast specific columns in a Polars DataFrame to specified data types, you can provide a mapping of column names to their target data types when using the cast()
method.
# Casting 'Fee' to Float32 and 'Discount' to Int32
df2 = df.cast({"Fee": pl.Float32, "Discount": pl.Int32})
print(df2)
# Output:
# shape: (4, 4)
┌─────────┬─────────┬──────────┬──────────┐
│ Courses ┆ Fee ┆ Duration ┆ Discount │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ f32 ┆ str ┆ i32 │
╞═════════╪═════════╪══════════╪══════════╡
│ Spark ┆ 20000.0 ┆ 30days ┆ 1000 │
│ PySpark ┆ 25000.0 ┆ 40days ┆ 2300 │
│ Python ┆ 22000.0 ┆ 35days ┆ 1200 │
│ pandas ┆ 30000.0 ┆ 50days ┆ 2000 │
└─────────┴─────────┴──────────┴──────────┘
Here,
- A dictionary is provided to the
cast()
method, with the keys representing column names and the values indicating the target data types. For instance,"Fee": pl.Float32
converts theFee
column toFloat32
. - Columns not mentioned in the mapping retain their original data types.
Fee
is converted toFloat32
, allowing floating-point precision.Discount
is converted toInt32
to optimize memory usage.
Convert All Columns to Strings
To convert all columns in a Polars DataFrame to strings, you can use the cast()
method with a wildcard to apply the transformation to all columns.
# Casting all columns to strings (Utf8)
df2 = df.select([pl.col(col).cast(pl.Utf8) for col in df.columns])
print(df2)
# Output:
# shape: (4, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fee ┆ Duration ┆ Discount │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str │
╞═════════╪═══════╪══════════╪══════════╡
│ Spark ┆ 20000 ┆ 30days ┆ 1000 │
│ PySpark ┆ 25000 ┆ 40days ┆ 2300 │
│ Python ┆ 22000 ┆ 35days ┆ 1200 │
│ pandas ┆ 30000 ┆ 50days ┆ 2000 │
└─────────┴───────┴──────────┴──────────┘
Here,
- The
select()
method is used to apply a transformation to all columns. pl.col(col).cast(pl.Utf8)
converts each column toUtf8
(string).- The result is a DataFrame where all columns are of type
str
.
Convert a Numeric Column to Boolean
To convert a numeric column to a Boolean in Polars, you can use the cast()
method. When casting, you can define that non-zero values will be converted to True
and zero values will be converted to False
.
import polars as pl
# Sample DataFrame with a numeric column
df = pl.DataFrame({
"Courses": ["Spark", "PySpark", "Python", "pandas"],
"Fee": [20000, 0, 22000, 0], # Numeric column
"Discount": [1000, 2300, 1200, 2000]
})
# Convert 'Fee' column to Boolean
df2 = df.with_columns(pl.col("Fee").cast(pl.Boolean))
print(df2)
# Output:
# shape: (4, 3)
┌─────────┬───────┬──────────┐
│ Courses ┆ Fee ┆ Discount │
│ --- ┆ --- ┆ --- │
│ str ┆ bool ┆ i64 │
╞═════════╪═══════╪══════════╡
│ Spark ┆ true ┆ 1000 │
│ PySpark ┆ false ┆ 2300 │
│ Python ┆ true ┆ 1200 │
│ pandas ┆ false ┆ 2000 │
└─────────┴───────┴──────────┘
Here,
- The
cast(pl.Boolean)
converts theFee
column to a Boolean type. - Non-zero values (e.g., 20000, 22000) are converted to
True
, while zero values are converted toFalse
.
Casting Date Column
When casting a Date column to a different data type in Polars, such as from Date to Datetime, the cast()
method allows you to convert the column to a more specific or extended type like Datetime (which includes both date and time).
import polars as pl
from datetime import date
# Sample DataFrame with Date column
df = pl.DataFrame({
"Courses": ["Spark", "PySpark", "pandas"],
"Fee": [20000, 25000, 30000],
"Date": [date(2023, 1, 2), date(2024, 3, 4), date(2025, 5, 6)], # Date column
})
# Casting date column
df2 = df.cast({pl.Date: pl.Datetime})
print(df2)
# Casting the 'Date' column to Datetime
df2 = df.with_columns(pl.col("Date").cast(pl.Datetime))
print(df2)
# Output:
# shape: (3, 3)
┌─────────┬───────┬─────────────────────┐
│ Courses ┆ Fee ┆ Date │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ datetime[μs] │
╞═════════╪═══════╪═════════════════════╡
│ Spark ┆ 20000 ┆ 2023-01-02 00:00:00 │
│ PySpark ┆ 25000 ┆ 2024-03-04 00:00:00 │
│ pandas ┆ 30000 ┆ 2025-05-06 00:00:00 │
└─────────┴───────┴─────────────────────┘
Here,
- The
Date
column is cast toDatetime
, which includes both date and time information. The time part is set to00:00:00
by default since the originalDate
column had no time information. - The
cast()
method allows you to change the type of a column. In this case,pl.Datetime
is used to convert theDate
column to aDatetime
type.
Similarly, selectors enable you to apply transformations, such as casting, to multiple columns by specifying column names, patterns, or conditions. The pl.col()
selector lets you target specific columns for transformation, such as changing their data type through casting.
# Cast 'Fee' column to Float32 and 'Date' column to Datetime
# Using selectors
df2 = df.with_columns([
pl.col("Fee").cast(pl.Float32),
pl.col("Date").cast(pl.Datetime)
])
print(df2)
# Output:
# shape: (3, 3)
┌─────────┬─────────┬─────────────────────┐
│ Courses ┆ Fee ┆ Date │
│ --- ┆ --- ┆ --- │
│ str ┆ f32 ┆ datetime[μs] │
╞═════════╪═════════╪═════════════════════╡
│ Spark ┆ 20000.0 ┆ 2023-01-02 00:00:00 │
│ PySpark ┆ 25000.0 ┆ 2024-03-04 00:00:00 │
│ pandas ┆ 30000.0 ┆ 2025-05-06 00:00:00 │
└─────────┴─────────┴─────────────────────┘
Conclusion
In this article, I have explained the Polars DataFrame cast()
method by using its syntax, parameters, usage, and how it returns a new DataFrame with the specified columns converted to the target data types. It does not modify the original DataFrame but instead creates a copy with the applied transformations.
Happy Learning!!
Related Articles
- Polars DataFrame.rename() Method
- Polars DataFrame.sort() Method
- Polars DataFrame.melt() Method
- Polars DataFrame.unique() Function
- Polars DataFrame.explode() Method
- Polars DataFrame.filter() Usage & Examples
- Polars DataFrame.join() Explained With Examples
- Polars DataFrame.pivot() Explained with Examples
- Polars DataFrame.groupby() Explained With Examples