In Polars, the fill_nan()
function is used to replace NaN
(Not a Number) values in a DataFrame with a specified value. It is specifically designed for handling missing or undefined numerical data in floating-point columns (f32
or f64
). This function is particularly useful for managing incomplete or undefined numerical values in a dataset.
In this article, I will explain the Polars DataFrame fill_nan()
function, including its syntax, parameters, and usage, to demonstrate how we can create a new DataFrame with NaN
values replaced by a specified value.
Key Points –
- The
fill_nan()
method is used to replace NaN (Not a Number) values in a DataFrame with a specified value. - It works only on columns with floating-point data types (
f32
orf64
), asNaN
is specific to floating-point numbers. - You can replace
NaN
with any fixed value (e.g.,0
,-1
,99.99
). - By default,
fill_nan()
replacesNaN
values in all floating-point columns in the DataFrame. - Polars operations, including
fill_nan()
, return a new DataFrame and do not modify the original DataFrame in place. - It can replace all
NaN
values with a fixed value across the entire DataFrame or specific columns. - Different replacement values can be assigned to different columns using
pl.with_columns()
. - Forward Fill (
strategy="forward"
) fillsNaN
values with the last observed non-null value in the column, making it useful for time series data. - Backward Fill (
strategy="backward"
) replacesNaN
with the next available non-null value in the column, maintaining data consistency.
Polars DataFrame fill_nan() Introduction
Let’s know the syntax of the fill_nan() function.
# Syntax of fill_nan()
DataFrame.fill_nan(value: Expr | int | float | None) → DataFrame
Parameters of the Polars DataFrame fill_nan()
Following are the parameters of the fill_nan() method.
value
– The value (integer, float, or expression) to replaceNaN
values with.- A constant (
int
orfloat
) - A Polars expression (
Expr
) None
(though this wouldn’t modify the DataFrame)
- A constant (
Return Value
This function returns a new DataFrame with NaN
values replaced by the specified value.
Usage of Polars DataFrame fill_nan() Function
The fill_nan()
function in Polars replaces all NaN (Not a Number) values in a DataFrame with a specified value. It is useful when dealing with missing numerical data, ensuring that computations and analyses are not affected by NaN
values.
To run some examples of the Polars DataFrame fill_nan()
function, let’s create a Polars DataFrame.
import polars as pl
df = pl.DataFrame({
"A": [1.5, 2, float("nan"), 4],
"B": [0.6, float("nan"), 5, 12]
})
print("Original DataFrame:\n", df)
Yields below output.
If you want to replace NaN
(Not a Number) values with a fixed value, you can use the fill_nan()
method. This method is specifically designed to handle NaN
values in floating-point columns.
# Replace NaN values with 0
fixed_value = 0
df2 = df.fill_nan(fixed_value)
print("DataFrame after replacing NaN:\n", df2)
# Replace NaN values with 0
df2 = df.fill_nan(0)
print("DataFrame after replacing NaN:\n", df2)
Here,
- The
fill_nan(fixed_value)
method replaces allNaN
values in the DataFrame with the specifiedfixed_value
(in this case,0
). - The resulting DataFrame
df2
has noNaN
values; they have been replaced by0
.
Replace NaN with Forward Fill (Previous Value)
You can replace NaN
values with the forward fill method, which propagates the last valid observation forward to fill NaN
values. This is particularly useful for time series or sequential data where you want to carry the previous value forward to handle missing data.
To achieve this, you can use the fill_null() method in combination with the forward_fill()
strategy. Note that forward_fill()
works with null
values, so you’ll first need to convert NaN
values to null
if your data contains NaN
.
# Replace NaN with null
df = df.fill_nan(None)
# Apply forward fill
df2 = df.fill_null(strategy="forward")
print("DataFrame after Forward Fill:\n", df2)
# Output:
# DataFrame after Forward Fill:
# shape: (4, 2)
┌─────┬──────┐
│ A ┆ B │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞═════╪══════╡
│ 1.5 ┆ 0.6 │
│ 2.0 ┆ 0.6 │
│ 2.0 ┆ 5.0 │
│ 4.0 ┆ 12.0 │
└─────┴──────┘
Here,
- The
fill_nan(None)
method converts allNaN
values to null. - The
fill_null(strategy="forward")
method propagates the last valid value forward to fillnull
values. - For example:
- In column
"A"
, theNaN
values are replaced with the previous valid value2.0
. - In column
"B"
, theNaN
values are replaced with the previous valid value6.0
.
- In column
Replace NaN with Backward Fill (Next Value)
You can replace NaN
values with the backward fill method, which propagates the next valid observation backward to fill NaN
values. This is useful when you want to handle missing data by carrying the next available value backward.
# Replace NaN with null
df = df.fill_nan(None)
# Apply backward fill
df2 = df.fill_null(strategy="backward")
print("DataFrame after backward fill:\n", df2)
# Output:
# DataFrame after backward fill:
# shape: (4, 2)
┌─────┬──────┐
│ A ┆ B │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞═════╪══════╡
│ 1.5 ┆ 0.6 │
│ 2.0 ┆ 5.0 │
│ 4.0 ┆ 5.0 │
│ 4.0 ┆ 12.0 │
└─────┴──────┘
Here,
- The
fill_nan(None)
method converts allNaN
values to null. - The
fill_null(strategy="forward")
method propagates the last valid value forward to fillnull
values. - For example:
- In column
"A"
, theNaN
values are replaced with the previous valid value4.0
. - In column
"B"
, theNaN
values are replaced with the previous valid value5.0
.
- In column
Replace NaN with a Custom Value
To replace NaN
values with a custom value in a Polars DataFrame, you can use the fill_nan()
function. This function allows you to specify the value that will replace all NaN
values in the DataFrame.
# Custom value to replace NaN
custom_value = 99.99
# Replace NaN with the custom value
df2= df.fill_nan(custom_value)
print("DataFrame after replacing NaN:\n", df2)
# Output:
# DataFrame after replacing NaN:
# shape: (4, 2)
┌───────┬───────┐
│ A ┆ B │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞═══════╪═══════╡
│ 1.5 ┆ 0.6 │
│ 2.0 ┆ 99.99 │
│ 99.99 ┆ 5.0 │
│ 4.0 ┆ 12.0 │
└───────┴───────┘
Here,
- The
fill_nan(custom_value)
method replaces allNaN
values in the DataFrame with the specifiedcustom_value
(in this case,99.99
). - The resulting DataFrame
df2
has noNaN
values; they have been replaced by99.99
.
Replace NaN with Different Values for Each Column
You can replace NaN
values with different values for each column by using the with_columns()
method along with the fill_nan()
function. This allows you to specify a custom replacement value for each column individually.
# Replace NaN with different values per column
df2 = df.with_columns([
pl.col("A").fill_nan(10),
pl.col("B").fill_nan(20)
])
print("DataFrame after replacing NaN with different values:\n", df2)
# Output:
# DataFrame after replacing NaN with different values:
# shape: (4, 2)
┌──────┬──────┐
│ A ┆ B │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞══════╪══════╡
│ 1.5 ┆ 0.6 │
│ 2.0 ┆ 20.0 │
│ 10.0 ┆ 5.0 │
│ 4.0 ┆ 12.0 │
└──────┴──────┘
Here,
fill_nan()
allows column-wise replacement ofNaN
values.- You can set different values for each column using
pl.with_columns()
. - Column
"A"
: NaN values are replaced with10
. - Column
"B"
: NaN values are replaced with20
.
Conclusion
In summary, the fill_nan()
function in Polars is an essential tool for handling missing (NaN
) values efficiently. By offering multiple replacement strategies, it simplifies data cleaning and ensures data integrity, making it a valuable feature when working with floating-point data.
Happy Learning!!
Related Articles
- Polars Filter by Column Value
- Convert Polars String to Integer
- Polars Sum Multiple Columns
- Select Polars Columns by Index
- Convert Polars Cast Integer to Float
- Convert Polars Cast Float to Integer
- How to drop a column using Polars
- Add New Columns to Polars DataFrame
- How to Select Columns by Data Type in Polars
- How to Convert a Polars DataFrame to Python List?