In Polars, you can use the cast()
function to convert an integer column to a string (Utf8). This is helpful when you need to transform numeric data into string format for tasks like text manipulation, concatenation, or exporting the data. To cast an integer column to a string, you can apply the cast()
function with either the with_columns()
method or the select()
method. In this article, I will explain how to convert an integer to a string (Utf8) using the cast()
function in Polars.
Key Points –
- The
cast(pl.Utf8)
method converts an integer column to a string (Utf8
) type in Polars. - You can cast a single column or multiple columns simultaneously using
with_columns()
orselect()
. - The
alias("New_Column_Name")
method helps rename the column after casting. - Using
select([pl.col("column_name").cast(pl.Utf8)])
creates a transformed DataFrame with only selected columns. - The
with_columns()
method allows modifying the DataFrame by adding new transformed columns. - Negative integer values are converted to their string representations without issues (e.g.,
-1000
becomes"-1000"
). - Casting can be used inside expressions for operations like concatenation, filtering, or formatting.
- Polars follows an immutable data paradigm, so casting returns a new DataFrame rather than modifying the existing one.
Usage of Polars Cast Int to String
The cast()
function in Polars is used to convert a column from one data type to another. When converting an integer column to a string, the function allows you to change the column’s data type to Utf8
, which is Polars’ representation of a string.
First, let’s create a Polars DataFrame.
import polars as pl
technologies= ({
'Courses':["Spark","PySpark","Hadoop","Pandas"],
'Fees' :[22000,25000,24000,26000],
'Discount':[1000,2300,2500,1400]
})
df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)
Yields below output.
You can use the with_columns()
method along with the pl.col().cast()
function to convert a single column from an integer to a string in Polars.
# Casting the 'Fees' column to a string
df_casted = df.with_columns(
pl.col("Fees").cast(pl.Utf8).alias("Fees"))
print("\nDataFrame after casting 'Fees' to string:\n", df_casted)
# Output:
# DataFrame after casting 'Fees' to string:
# shape: (4, 3)
┌─────────┬───────┬──────────┐
│ Courses ┆ Fees ┆ Discount │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞═════════╪═══════╪══════════╡
│ Spark ┆ 22000 ┆ 1000 │
│ PySpark ┆ 25000 ┆ 2300 │
│ Hadoop ┆ 24000 ┆ 2500 │
│ Pandas ┆ 26000 ┆ 1400 │
└─────────┴───────┴──────────┘
Here,
pl.col("Fees")
– Selects the column"Fees"
.cast(pl.Utf8)
– Casts the column to a UTF-8 string data type.alias("Fees")
– Renames the column to ensure its name remains"Fees"
.with_columns()
– Replaces the original column with the newly cast column.
Cast Multiple Columns from Int to String
To cast multiple columns from integers to strings in Polars, you can use the with_columns() method along with pl.col().cast()
. The process involves selecting multiple columns and applying the casting operation to all of them.
# Casting 'Fees' and 'Discount' columns to strings
df_casted = df.with_columns(
[pl.col(col).cast(pl.Utf8).alias(col) for col in ["Fees", "Discount"]])
print("\nDataFrame after casting 'Fees' and 'Discount' to strings:\n", df_casted)
# Output:
# DataFrame after casting 'Fees' and 'Discount' to strings:
# shape: (4, 3)
┌─────────┬───────┬──────────┐
│ Courses ┆ Fees ┆ Discount │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════════╪═══════╪══════════╡
│ Spark ┆ 22000 ┆ 1000 │
│ PySpark ┆ 25000 ┆ 2300 │
│ Hadoop ┆ 24000 ┆ 2500 │
│ Pandas ┆ 26000 ┆ 1400 │
└─────────┴───────┴──────────┘
Here,
- Selecting Multiple Columns – The list comprehension
[pl.col(col).cast(pl.Utf8).alias(col) for col in ["Fees", "Discount"]]
generates the necessary transformations for each specified column. pl.col(col).cast(pl.Utf8)
– Casts each column to a UTF-8 string data type..alias(col)
– Ensures that the column name remains the same after casting.with_columns()
– Applies the transformations to the DataFrame.
Cast int64 to String
To cast a column of type int64
to a string in Polars, you can use the cast() function, specifying the Utf8
data type for the string conversion.
# Cast 'Fees' and 'Discount' columns from int64 to string
df_casted = df.with_columns([
pl.col("Fees").cast(pl.Utf8).alias("Fees"),
pl.col("Discount").cast(pl.Utf8).alias("Discount")])
print("DataFrame after casting int64 to string:\n", df_casted)
# Output:
# DataFrame after casting int64 to string:
# shape: (4, 3)
┌─────────┬───────┬──────────┐
│ Courses ┆ Fees ┆ Discount │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════════╪═══════╪══════════╡
│ Spark ┆ 22000 ┆ 1000 │
│ PySpark ┆ 25000 ┆ 2300 │
│ Hadoop ┆ 24000 ┆ 2500 │
│ Pandas ┆ 26000 ┆ 1400 │
└─────────┴───────┴──────────┘
Here,
pl.col("Fees")
– Selects theFees
column.cast(pl.Utf8)
– Casts theFees
column fromint64
to string (Utf8
).alias("Fees")
– Retains the name of the column asFees
after casting.with_columns()
– Applies the cast transformation to bothFees
andDiscount
columns at the same time.
Cast Int Column and Rename
To cast an integer column to a string and rename it in Polars, you can use the cast()
method and the alias()
method together inside the with_columns()
function. Here’s how you can cast an integer column (like Fees
) to a string and rename it (for example, to Fees_Str
).
# Cast 'Fees' column from int to string and rename it to 'Fees_Str'
df_casted = df.with_columns(
pl.col("Fees").cast(pl.Utf8).alias("Fees_Str"))
print("DataFrame after casting 'Fees' to string and renaming it to 'Fees_Str':\n", df_casted)
# Output:
# DataFrame after casting 'Fees' to string and renaming it to 'Fees_Str':
# shape: (4, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fees ┆ Discount ┆ Fees_Str │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ str │
╞═════════╪═══════╪══════════╪══════════╡
│ Spark ┆ 22000 ┆ 1000 ┆ 22000 │
│ PySpark ┆ 25000 ┆ 2300 ┆ 25000 │
│ Hadoop ┆ 24000 ┆ 2500 ┆ 24000 │
│ Pandas ┆ 26000 ┆ 1400 ┆ 26000 │
└─────────┴───────┴──────────┴──────────┘
Here,
pl.col("Fees")
– Selects theFees
column.cast(pl.Utf8)
– Casts theFees
column from integer (int64
) to string (Utf8
).alias("Fees_Str")
– Renames the casted column toFees_Str
.with_columns()
– Applies the casting and renaming transformation to theFees
column.
Cast Int to String Using select() and alias()
You can use this approach to transform specific columns without altering the original DataFrame. The select()
method, combined with alias()
, allows you to cast an integer column to a string in Polars.
# Cast 'Fees' column from int to string
# Using select() and alias()
df_casted = df.select([
pl.col("Courses"),
pl.col("Fees").cast(pl.Utf8).alias("Fees_Str"),
pl.col("Discount")])
print("DataFrame after casting 'Fees' to string using select() and alias():\n", df_casted)
# Output:
# DataFrame after casting 'Fees' to string using select() and alias():
# shape: (4, 3)
┌─────────┬──────────┬──────────┐
│ Courses ┆ Fees_Str ┆ Discount │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞═════════╪══════════╪══════════╡
│ Spark ┆ 22000 ┆ 1000 │
│ PySpark ┆ 25000 ┆ 2300 │
│ Hadoop ┆ 24000 ┆ 2500 │
│ Pandas ┆ 26000 ┆ 1400 │
└─────────┴──────────┴──────────┘
Here,
select()
– Extracts only the specified columns.pl.col("Fees").cast(pl.Utf8).alias("Fees_Str")
– ConvertsFees
fromint64
toUtf8
(string). Renames it to"Fees_Str"
.- Preserving Other Columns – The
"Courses"
and"Discount"
columns remain unaffected.
Cast Negative Int Values to String
To cast negative integer values to strings in Polars, you can use the cast(pl.Utf8)
function. Below is an example where we modify the dataset to include negative values and then cast them to strings.
import polars as pl
# Sample data with negative integer values
technologies = {
'Courses': ["Spark", "PySpark", "Hadoop", "Pandas"],
'Fees': [-22000, -25000, -24000, -26000], # Negative values
'Discount': [-1000, -2300, -2500, -1400] # Negative values
}
df = pl.DataFrame(technologies)
# Cast 'Fees' and 'Discount' columns from int to string
df_casted = df.with_columns([
pl.col("Fees").cast(pl.Utf8).alias("Fees"),
pl.col("Discount").cast(pl.Utf8).alias("Discount")])
print("DataFrame after casting negative int64 values to string:\n", df_casted)
# Output:
# DataFrame after casting negative int64 values to string:
# shape: (4, 3)
┌─────────┬────────┬──────────┐
│ Courses ┆ Fees ┆ Discount │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════════╪════════╪══════════╡
│ Spark ┆ -22000 ┆ -1000 │
│ PySpark ┆ -25000 ┆ -2300 │
│ Hadoop ┆ -24000 ┆ -2500 │
│ Pandas ┆ -26000 ┆ -1400 │
└─────────┴────────┴──────────┘
Here,
Negative Values
– TheFees
andDiscount
columns contain negative integer values.pl.col("Fees").cast(pl.Utf8).alias("Fees")
– Converts theFees
column fromint64
to string (Utf8
).pl.col("Discount").cast(pl.Utf8).alias("Discount")
– Converts theDiscount
column fromint64
to string (Utf8
).with_columns()
– Applies these transformations to modify the DataFrame.
Conclusion
In conclusion, polars cast an int (integer) column to a string using the cast()
function is a simple and efficient way to transform your data for various operations like text manipulation, concatenation, or export. You can perform this conversion using the with_columns()
or select()
methods, depending on whether you want to modify the existing DataFrame or select specific columns.
Happy Learning!!
Related Articles
- Convert Polars Cast String to Float
- Convert Polars Cast Float to String
- Polars DataFrame drop() Method
- How to Drop Row in Polars
- Polars Cast Multiple Columns
- Polars DataFrame select() Method
- How to Transpose DataFrame in Polars
- Polars DataFrame.rename() Method