• Post author:
  • Post category:Polars
  • Post last modified:January 8, 2025
  • Reading time:15 mins read

In Polars, the cast() method is used to change the data type of one or more columns in a DataFrame. It is useful when you need to convert columns to a specific type for analysis or to improve memory efficiency. This method can be applied to individual columns or multiple columns at once, supporting a range of data types, including Int64, Float32, Utf8, and more.

Advertisements

In this article, I will explain the cast() function and using its syntax, parameters, and usage how to return a new DataFrame with the specified data types applied to the columns.

Key Points –

  • The cast() method in Polars is used to convert the data type of one or more columns in a DataFrame.
  • You can cast multiple columns at once by passing a dictionary or a list of column transformations.
  • The method allows you to target specific columns using column selectors like pl.col().
  • You can cast columns to various data types, including Int64, Float32, Utf8 (string), Boolean, Date, and Datetime.
  • Casting can result in data loss or truncation, especially when converting from a higher precision type (e.g., Float64 to Int32).
  • Casting columns to more suitable types can optimize memory usage and computational efficiency during analysis.
  • The cast() method can be used to convert string columns representing dates into Datetime or Date types for easier date manipulation.
  • The cast() method is versatile and supports casting a single column, multiple columns, or even all columns of a DataFrame at once.

Polars DataFrame.cast() Syntax

Following is a syntax of the DataFrame.cast(). This function takes dtypes, and strict params.


# Syntax of cast()
DataFrame.cast(
    dtypes: Mapping[ColumnNameOrSelector | PolarsDataType, PolarsDataType | PythonDataType] | PolarsDataType,
    *,
    strict: bool = True,
) → DataFrame

Parameters of the Polars DataFrame.cast()

Following are the parameters of cast() method.

  • dtypes – Specifies the desired data types for the columns.
    • A single data type (PolarsDataType or PythonDataType), which will be applied to all columns in the DataFrame (e.g., pl.Int64).
    • A dictionary where the keys are column names (or column selectors) and the values are the target data types for the respective columns. The values can be either PolarsDataType or PythonDataType (e.g., pl.Int32, pl.Float64).
  • strict – (optional, default True).
    • When strict=True, the method raises an error if the type conversion is not possible (e.g., trying to convert a string with non-numeric characters to a numeric type).
    • When strict=False, the method will attempt to convert columns even if some values cannot be converted, and non-convertible values will be turned into null.

Return Value

This function returns a new DataFrame with the specified data types applied to the columns.

Usage of Polars DataFrame.cast() Method

The cast() method in Polars is used to convert the data type of one or more columns in a DataFrame to a specified data type. This method is helpful for ensuring that the data types are appropriate for the analysis or operations you want to perform.

To run some examples of the Polars DataFrame.cast() method, let’s create a Polars DataFrame.


import polars as pl

technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000, 30000],
    'Duration':['30days','40days','35days','50days'],
    'Discount':[1000,2300,1200,2000]
              }
df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)

Yields below output.

polars cast

To cast a single column to a different data type in Polars, you can pass a dictionary with the column name and the target data type to the cast() method.


# Casting the 'Fee' column to Float32
df2 = df.cast({"Fee": pl.Float32})
print(df2)

Here,

  • The Fee column, originally of type Int64, is cast to Float32.
  • The result is a DataFrame where the Fee column contains floating-point numbers.
polars cast

Convert Multiple Columns to Specific Types

To convert multiple columns to specific types in Polars, you can use the cast() method with a mapping that specifies the target data type for each column.


# Casting 'Fee' to Float32 and 'Duration' to Utf8 (string)
df2 = df.cast({"Fee": pl.Float32, "Duration": pl.Utf8})
print(df2)

# Output:
# shape: (4, 4)
┌─────────┬─────────┬──────────┬──────────┐
│ Courses ┆ Fee     ┆ Duration ┆ Discount │
│ ---     ┆ ---     ┆ ---      ┆ ---      │
│ str     ┆ f32     ┆ str      ┆ i64      │
╞═════════╪═════════╪══════════╪══════════╡
│ Spark   ┆ 20000.0 ┆ 30days   ┆ 1000     │
│ PySpark ┆ 25000.0 ┆ 40days   ┆ 2300     │
│ Python  ┆ 22000.0 ┆ 35days   ┆ 1200     │
│ pandas  ┆ 30000.0 ┆ 50days   ┆ 2000     │
└─────────┴─────────┴──────────┴──────────┘

Here,

  • The "Fee" column is converted to Float32 to ensure floating-point precision, while the "Duration" column is explicitly cast to Utf8 (string) to confirm its data type, even though it was already a string.
  • Columns not specified in the mapping (e.g., “Courses” and “Discount”) retain their original data types.
  • You can cast as many columns as needed by adding more key-value pairs to the mapping.

Alternatively, to cast specific columns in a Polars DataFrame to specified data types, you can provide a mapping of column names to their target data types when using the cast() method.


# Casting 'Fee' to Float32 and 'Discount' to Int32
df2 = df.cast({"Fee": pl.Float32, "Discount": pl.Int32})
print(df2)

# Output:
# shape: (4, 4)
┌─────────┬─────────┬──────────┬──────────┐
│ Courses ┆ Fee     ┆ Duration ┆ Discount │
│ ---     ┆ ---     ┆ ---      ┆ ---      │
│ str     ┆ f32     ┆ str      ┆ i32      │
╞═════════╪═════════╪══════════╪══════════╡
│ Spark   ┆ 20000.0 ┆ 30days   ┆ 1000     │
│ PySpark ┆ 25000.0 ┆ 40days   ┆ 2300     │
│ Python  ┆ 22000.0 ┆ 35days   ┆ 1200     │
│ pandas  ┆ 30000.0 ┆ 50days   ┆ 2000     │
└─────────┴─────────┴──────────┴──────────┘

Here,

  • A dictionary is provided to the cast() method, with the keys representing column names and the values indicating the target data types. For instance, "Fee": pl.Float32 converts the Fee column to Float32.
  • Columns not mentioned in the mapping retain their original data types.
  • Fee is converted to Float32, allowing floating-point precision. Discount is converted to Int32 to optimize memory usage.

Convert All Columns to Strings

To convert all columns in a Polars DataFrame to strings, you can use the cast() method with a wildcard to apply the transformation to all columns.


# Casting all columns to strings (Utf8)
df2 = df.select([pl.col(col).cast(pl.Utf8) for col in df.columns])
print(df2)

# Output:
# shape: (4, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fee   ┆ Duration ┆ Discount │
│ ---     ┆ ---   ┆ ---      ┆ ---      │
│ str     ┆ str   ┆ str      ┆ str      │
╞═════════╪═══════╪══════════╪══════════╡
│ Spark   ┆ 20000 ┆ 30days   ┆ 1000     │
│ PySpark ┆ 25000 ┆ 40days   ┆ 2300     │
│ Python  ┆ 22000 ┆ 35days   ┆ 1200     │
│ pandas  ┆ 30000 ┆ 50days   ┆ 2000     │
└─────────┴───────┴──────────┴──────────┘

Here,

  • The select() method is used to apply a transformation to all columns.
  • pl.col(col).cast(pl.Utf8) converts each column to Utf8 (string).
  • The result is a DataFrame where all columns are of type str.

Convert a Numeric Column to Boolean

To convert a numeric column to a Boolean in Polars, you can use the cast() method. When casting, you can define that non-zero values will be converted to True and zero values will be converted to False.


import polars as pl

# Sample DataFrame with a numeric column
df = pl.DataFrame({
    "Courses": ["Spark", "PySpark", "Python", "pandas"],
    "Fee": [20000, 0, 22000, 0],  # Numeric column
    "Discount": [1000, 2300, 1200, 2000]
})

# Convert 'Fee' column to Boolean
df2 = df.with_columns(pl.col("Fee").cast(pl.Boolean))
print(df2)

# Output:
# shape: (4, 3)
┌─────────┬───────┬──────────┐
│ Courses ┆ Fee   ┆ Discount │
│ ---     ┆ ---   ┆ ---      │
│ str     ┆ bool  ┆ i64      │
╞═════════╪═══════╪══════════╡
│ Spark   ┆ true  ┆ 1000     │
│ PySpark ┆ false ┆ 2300     │
│ Python  ┆ true  ┆ 1200     │
│ pandas  ┆ false ┆ 2000     │
└─────────┴───────┴──────────┘

Here,

  • The cast(pl.Boolean) converts the Fee column to a Boolean type.
  • Non-zero values (e.g., 20000, 22000) are converted to True, while zero values are converted to False.

Casting Date Column

When casting a Date column to a different data type in Polars, such as from Date to Datetime, the cast() method allows you to convert the column to a more specific or extended type like Datetime (which includes both date and time).


import polars as pl
from datetime import date

# Sample DataFrame with Date column
df = pl.DataFrame({
        "Courses": ["Spark", "PySpark", "pandas"],
        "Fee": [20000, 25000, 30000],  
        "Date": [date(2023, 1, 2), date(2024, 3, 4), date(2025, 5, 6)],  # Date column
    })

# Casting date column
df2 = df.cast({pl.Date: pl.Datetime})
print(df2)

# Casting the 'Date' column to Datetime
df2 = df.with_columns(pl.col("Date").cast(pl.Datetime))
print(df2)

# Output:
# shape: (3, 3)
┌─────────┬───────┬─────────────────────┐
│ Courses ┆ Fee   ┆ Date                │
│ ---     ┆ ---   ┆ ---                 │
│ str     ┆ i64   ┆ datetime[μs]        │
╞═════════╪═══════╪═════════════════════╡
│ Spark   ┆ 20000 ┆ 2023-01-02 00:00:00 │
│ PySpark ┆ 25000 ┆ 2024-03-04 00:00:00 │
│ pandas  ┆ 30000 ┆ 2025-05-06 00:00:00 │
└─────────┴───────┴─────────────────────┘

Here,

  • The Date column is cast to Datetime, which includes both date and time information. The time part is set to 00:00:00 by default since the original Date column had no time information.
  • The cast() method allows you to change the type of a column. In this case, pl.Datetime is used to convert the Date column to a Datetime type.

Similarly, selectors enable you to apply transformations, such as casting, to multiple columns by specifying column names, patterns, or conditions. The pl.col() selector lets you target specific columns for transformation, such as changing their data type through casting.


# Cast 'Fee' column to Float32 and 'Date' column to Datetime 
# Using selectors
df2 = df.with_columns([
    pl.col("Fee").cast(pl.Float32),  
    pl.col("Date").cast(pl.Datetime)  
])
print(df2)

# Output:
# shape: (3, 3)
┌─────────┬─────────┬─────────────────────┐
│ Courses ┆ Fee     ┆ Date                │
│ ---     ┆ ---     ┆ ---                 │
│ str     ┆ f32     ┆ datetime[μs]        │
╞═════════╪═════════╪═════════════════════╡
│ Spark   ┆ 20000.0 ┆ 2023-01-02 00:00:00 │
│ PySpark ┆ 25000.0 ┆ 2024-03-04 00:00:00 │
│ pandas  ┆ 30000.0 ┆ 2025-05-06 00:00:00 │
└─────────┴─────────┴─────────────────────┘

Conclusion

In this article, I have explained the Polars DataFrame cast() method by using its syntax, parameters, usage, and how it returns a new DataFrame with the specified columns converted to the target data types. It does not modify the original DataFrame but instead creates a copy with the applied transformations.

Happy Learning!!

References