• Post author:
  • Post category:Polars
  • Post last modified:March 19, 2025
  • Reading time:14 mins read
You are currently viewing Polars DataFrame fill_nan() Usage & Examples

In Polars, the fill_nan() function is used to replace NaN (Not a Number) values in a DataFrame with a specified value. It is specifically designed for handling missing or undefined numerical data in floating-point columns (f32 or f64). This function is particularly useful for managing incomplete or undefined numerical values in a dataset.

Advertisements

In this article, I will explain the Polars DataFrame fill_nan() function, including its syntax, parameters, and usage, to demonstrate how we can create a new DataFrame with NaN values replaced by a specified value.

Key Points –

  • The fill_nan() method is used to replace NaN (Not a Number) values in a DataFrame with a specified value.
  • It works only on columns with floating-point data types (f32 or f64), as NaN is specific to floating-point numbers.
  • You can replace NaN with any fixed value (e.g., 0, -1, 99.99).
  • By default, fill_nan() replaces NaN values in all floating-point columns in the DataFrame.
  • Polars operations, including fill_nan(), return a new DataFrame and do not modify the original DataFrame in place.
  • It can replace all NaN values with a fixed value across the entire DataFrame or specific columns.
  • Different replacement values can be assigned to different columns using pl.with_columns().
  • Forward Fill (strategy="forward") fills NaN values with the last observed non-null value in the column, making it useful for time series data.
  • Backward Fill (strategy="backward") replaces NaN with the next available non-null value in the column, maintaining data consistency.

Polars DataFrame fill_nan() Introduction

Let’s know the syntax of the fill_nan() function.


# Syntax of fill_nan()
DataFrame.fill_nan(value: Expr | int | float | None) → DataFrame

Parameters of the Polars DataFrame fill_nan()

Following are the parameters of the fill_nan() method.

  • value – The value (integer, float, or expression) to replace NaN values with.
    • A constant (int or float)
    • A Polars expression (Expr)
    • None (though this wouldn’t modify the DataFrame)

Return Value

This function returns a new DataFrame with NaN values replaced by the specified value.

Usage of Polars DataFrame fill_nan() Function

The fill_nan() function in Polars replaces all NaN (Not a Number) values in a DataFrame with a specified value. It is useful when dealing with missing numerical data, ensuring that computations and analyses are not affected by NaN values.

To run some examples of the Polars DataFrame fill_nan() function, let’s create a Polars DataFrame.


import polars as pl

df = pl.DataFrame({
    "A": [1.5, 2, float("nan"), 4],
    "B": [0.6, float("nan"), 5, 12]
})

print("Original DataFrame:\n", df)

Yields below output.

polars fill_nan

If you want to replace NaN (Not a Number) values with a fixed value, you can use the fill_nan() method. This method is specifically designed to handle NaN values in floating-point columns.


# Replace NaN values with 0
fixed_value = 0
df2 = df.fill_nan(fixed_value)
print("DataFrame after replacing NaN:\n", df2)

# Replace NaN values with 0
df2 = df.fill_nan(0)
print("DataFrame after replacing NaN:\n", df2)

Here,

  • The fill_nan(fixed_value) method replaces all NaN values in the DataFrame with the specified fixed_value (in this case, 0).
  • The resulting DataFrame df2 has no NaN values; they have been replaced by 0.
polars fill_nan

Replace NaN with Forward Fill (Previous Value)

You can replace NaN values with the forward fill method, which propagates the last valid observation forward to fill NaN values. This is particularly useful for time series or sequential data where you want to carry the previous value forward to handle missing data.

To achieve this, you can use the fill_null() method in combination with the forward_fill() strategy. Note that forward_fill() works with null values, so you’ll first need to convert NaN values to null if your data contains NaN.


# Replace NaN with null
df = df.fill_nan(None)

# Apply forward fill
df2 = df.fill_null(strategy="forward")
print("DataFrame after Forward Fill:\n", df2)

# Output:
# DataFrame after Forward Fill:
# shape: (4, 2)
┌─────┬──────┐
│ A   ┆ B    │
│ --- ┆ ---  │
│ f64 ┆ f64  │
╞═════╪══════╡
│ 1.5 ┆ 0.6  │
│ 2.0 ┆ 0.6  │
│ 2.0 ┆ 5.0  │
│ 4.0 ┆ 12.0 │
└─────┴──────┘

Here,

  • The fill_nan(None) method converts all NaN values to null.
  • The fill_null(strategy="forward") method propagates the last valid value forward to fill null values.
  • For example:
    • In column "A", the NaN values are replaced with the previous valid value 2.0.
    • In column "B", the NaN values are replaced with the previous valid value 6.0.

Replace NaN with Backward Fill (Next Value)

You can replace NaN values with the backward fill method, which propagates the next valid observation backward to fill NaN values. This is useful when you want to handle missing data by carrying the next available value backward.


# Replace NaN with null
df = df.fill_nan(None)

# Apply backward fill
df2 = df.fill_null(strategy="backward")
print("DataFrame after backward fill:\n", df2)

# Output:
# DataFrame after backward fill:
# shape: (4, 2)
┌─────┬──────┐
│ A   ┆ B    │
│ --- ┆ ---  │
│ f64 ┆ f64  │
╞═════╪══════╡
│ 1.5 ┆ 0.6  │
│ 2.0 ┆ 5.0  │
│ 4.0 ┆ 5.0  │
│ 4.0 ┆ 12.0 │
└─────┴──────┘

Here,

  • The fill_nan(None) method converts all NaN values to null.
  • The fill_null(strategy="forward") method propagates the last valid value forward to fill null values.
  • For example:
    • In column "A", the NaN values are replaced with the previous valid value 4.0.
    • In column "B", the NaN values are replaced with the previous valid value 5.0.

Replace NaN with a Custom Value

To replace NaN values with a custom value in a Polars DataFrame, you can use the fill_nan() function. This function allows you to specify the value that will replace all NaN values in the DataFrame.


# Custom value to replace NaN
custom_value = 99.99

# Replace NaN with the custom value
df2= df.fill_nan(custom_value)
print("DataFrame after replacing NaN:\n", df2)

# Output:
# DataFrame after replacing NaN:
# shape: (4, 2)
┌───────┬───────┐
│ A     ┆ B     │
│ ---   ┆ ---   │
│ f64   ┆ f64   │
╞═══════╪═══════╡
│ 1.5   ┆ 0.6   │
│ 2.0   ┆ 99.99 │
│ 99.99 ┆ 5.0   │
│ 4.0   ┆ 12.0  │
└───────┴───────┘

Here,

  • The fill_nan(custom_value) method replaces all NaN values in the DataFrame with the specified custom_value (in this case, 99.99).
  • The resulting DataFrame df2 has no NaN values; they have been replaced by 99.99.

Replace NaN with Different Values for Each Column

You can replace NaN values with different values for each column by using the with_columns() method along with the fill_nan() function. This allows you to specify a custom replacement value for each column individually.


# Replace NaN with different values per column
df2 = df.with_columns([
    pl.col("A").fill_nan(10),  
    pl.col("B").fill_nan(20)   
])
print("DataFrame after replacing NaN with different values:\n", df2)

# Output:
# DataFrame after replacing NaN with different values:
# shape: (4, 2)
┌──────┬──────┐
│ A    ┆ B    │
│ ---  ┆ ---  │
│ f64  ┆ f64  │
╞══════╪══════╡
│ 1.5  ┆ 0.6  │
│ 2.0  ┆ 20.0 │
│ 10.0 ┆ 5.0  │
│ 4.0  ┆ 12.0 │
└──────┴──────┘

Here,

  • fill_nan() allows column-wise replacement of NaN values.
  • You can set different values for each column using pl.with_columns().
  • Column "A": NaN values are replaced with 10.
  • Column "B": NaN values are replaced with 20.

Conclusion

In summary, the fill_nan() function in Polars is an essential tool for handling missing (NaN) values efficiently. By offering multiple replacement strategies, it simplifies data cleaning and ensures data integrity, making it a valuable feature when working with floating-point data.

Happy Learning!!

References