• Post author:
  • Post category:Polars
  • Post last modified:April 14, 2025
  • Reading time:13 mins read
You are currently viewing Polars DataFrame var() – Explained by Examples

In Polars, the var() function on a DataFrame is used to calculate the variance of numerical columns. Variance is a statistical measure of how spread out the values in a column are from the mean. A higher variance means more spread; a lower variance means the values are closer to the mean.

Advertisements

In this article, I will explain the Polars DataFrame var() method, covering its syntax, parameters, and practical usage. This function generates a new Polars DataFrame containing the variance for each numeric column, automatically excluding any non-numeric columns.

Key Points –

  • The var() method computes the sample variance of each numeric column in a DataFrame.
  • The var() function works only on numeric columns (integers and floats) in the DataFrame.
  • By default, it calculates the sample variance using ddof=1 (degrees of freedom).
  • To compute population variance, set ddof=0.
  • Non-numeric columns (e.g., strings, booleans) are automatically excluded from the variance calculation.
  • Null values are ignored in the calculation. Only non-null values are considered when calculating the variance.
  • Polars is optimized for speed, so var() can handle large datasets efficiently by leveraging parallel processing.
  • The result of var() is a DataFrame with a single row where each column represents the variance of the corresponding numeric column in the original DataFrame.

Polars DataFrame var() Introduction

Let’s know the syntax of the DataFrame var() function.


# Syntax of var() function
DataFrame.var(ddof: int = 1) → DataFrame

Parameters of the Polars DataFrame var()

It allows only one parameter.

  • ddof stands for Delta Degrees of Freedom.
  • It’s a parameter that adjusts how the variance is calculated:
    • ddof=1 (default): calculates sample variance (divides by n – 1).
    • ddof=0: calculates population variance (divides by n)

Return Value

This function returns a new Polars DataFrame with the variance of each numeric column. Non-numeric columns are excluded.

Usage of Polars DataFrame var() Method

The var() function in Polars calculates the variance of a column or expression. Variance indicates how much the values in a dataset deviate from the average (mean).

First, let’s create a Polars DataFrame.


import polars as pl

# Creating a sample DataFrame
data = {
    'A': [15, 38, 13, 24],
    'B': [32, 21, 49, 11],
    'C': [12, 22, 36, 18]
}

df = pl.DataFrame(data)
print("Original DataFrame:\n", df)

Yields below output.

polars var

The var() method in Polars computes the variance for each numeric column in a DataFrame. Variance is a statistical measure that tells you how spread out the values in a dataset are around the mean.


# Calculating the variance of numeric columns
result = df.var()
print("Variance of Numeric Columns:\n", result)

Here,

  • The variance is calculated for each of the numeric columns (A, B, and C).
  • The var() method returns the variance of each numeric column in a new DataFrame.
  • The result is a single row with the variance of each column.
polars var

Using ddof=0 (Population Variance)

To calculate the population variance using the ddof=0 argument in the var() method, you simply need to set the ddof parameter to 0. This will calculate the variance as if the data represents the entire population rather than a sample (which is the default behavior with ddof=1).


# Calculating the population variance (ddof=0)
result = df.var(ddof=0)
print("Population Variance of Numeric Columns:\n", result)

# Output:
# Population Variance of Numeric Columns:
# shape: (1, 3)
┌───────┬──────────┬──────┐
│ A     ┆ B        ┆ C    │
│ ---   ┆ ---      ┆ ---  │
│ f64   ┆ f64      ┆ f64  │
╞═══════╪══════════╪══════╡
│ 97.25 ┆ 198.6875 ┆ 78.0 │
└───────┴──────────┴──────┘

Here,

  • The population variance is calculated by setting ddof=0. In this case, the formula divides by n (the total number of elements) instead of n-1 (used in sample variance).
  • The result gives you the variance of each numeric column (A, B, C) treating the data as the entire population.

Variance on Floating-Point Numbers

Polars handles floating-point numbers seamlessly with the var() method, just like it does with integers. The method will compute the variance using the float values as-is, which is especially useful in real-world data scenarios like measurements, scores, prices, etc.


import polars as pl

# DataFrame with floating-point numbers
data = {
    'X': [2.5, 3.7, 1.8, 4.1],
    'Y': [7.2, 6.8, 8.9, 5.3],
}

df = pl.DataFrame(data)

# Compute sample variance (default ddof=1)
result = df.var()
print("Sample Variance (Floating-Point Data):\n", result)

# Output:
# Sample Variance (Floating-Point Data):
# shape: (1, 2)
┌──────────┬──────┐
│ X        ┆ Y    │
│ ---      ┆ ---  │
│ f64      ┆ f64  │
╞══════════╪══════╡
│ 1.129167 ┆ 2.19 │
└──────────┴──────┘

Here,

  • Polars automatically handles floating-point precision and returns a f64 result.
  • X and Y are both float columns, and the variance is computed with the sample formula (dividing by n-1).

Including a Non-Numeric Column

When you apply the var() method in Polars to a DataFrame that includes non-numeric columns, Polars will automatically ignore those non-numeric columns. It only calculates variance for the numeric columns in the DataFrame.


import polars as pl

# Creating a sample DataFrame with a non-numeric column
data = {
    'A': [15, 38, 13, 24],
    'B': [32, 21, 49, 11],
    'C': [12, 22, 36, 18],
    'Category': ['X', 'Y', 'Z', 'W']  # Non-numeric column
}

df = pl.DataFrame(data)

# Calculating the variance of numeric columns
result = df.var()
print("Variance of Numeric Columns:\n", result)

# Output:
# Variance of Numeric Columns:
# shape: (1, 4)
┌────────────┬────────────┬───────┬──────────┐
│ A          ┆ B          ┆ C     ┆ Category │
│ ---        ┆ ---        ┆ ---   ┆ ---      │
│ f64        ┆ f64        ┆ f64   ┆ str      │
╞════════════╪════════════╪═══════╪══════════╡
│ 129.666667 ┆ 264.916667 ┆ 104.0 ┆ null     │
└────────────┴────────────┴───────┴──────────┘

Here,

  • The non-numeric column Category (with values like 'X', 'Y', etc.) is ignored when calculating the variance.
  • Only the numeric columns A, B, and C are included in the output, with their respective variances.

Variance After Filtering Rows

You can filter rows based on conditions and then compute the variance on the filtered subset of the DataFrame using filter() followed by var().


import polars as pl

# Sample DataFrame
df = pl.DataFrame({
    'City': ["Delhi", "Delhi", "Mumbai", "Mumbai", "Delhi"],
    'Temperature': [28.5, 30.2, 33.1, 29.8, 31.0],
    'Humidity': [65.0, 70.2, 75.1, 69.8, 68.0]
})

# Filter for only rows where City is "Delhi"
filtered_df = df.filter(pl.col("City") == "Delhi")

# Compute variance on filtered rows
result = filtered_df.select(pl.exclude("City").var())
print("Variance for 'Delhi':\n", result)

# Output:
# Variance for 'Delhi':
# shape: (1, 2)
┌─────────────┬──────────┐
│ Temperature ┆ Humidity │
│ ---         ┆ ---      │
│ f64         ┆ f64      │
╞═════════════╪══════════╡
│ 1.63        ┆ 6.813333 │
└─────────────┴──────────┘

Here,

  • We filter the DataFrame to only include rows where City == "Delhi".
  • We exclude the non-numeric column City using pl.exclude("City") before calling var().
  • Then, var() is applied to the remaining numeric columns.

Variance with Null Values

When you have null (missing) values in a column, Polars automatically skips them while computing variance, it only uses the valid numeric values.


import polars as pl

# DataFrame with some null (None) values
data = {
    'A': [10, 20, None, 30],
    'B': [5, None, 15, 25],
    'C': [None, None, None, None]  # All nulls
}

df = pl.DataFrame(data)

# Compute variance
result = df.var()
print("Variance with Null Values:\n", result)

# Output:
# Variance with Null Values:
# shape: (1, 3)
┌───────┬───────┬──────┐
│ A     ┆ B     ┆ C    │
│ ---   ┆ ---   ┆ ---  │
│ f64   ┆ f64   ┆ null │
╞═══════╪═══════╪══════╡
│ 100.0 ┆ 100.0 ┆ null │
└───────┴───────┴──────┘

Here,

  • Columns A and B contain some nulls, Polars computes variance using the non-null values only.
  • Column C is all null, Polars returns null for its variance because there’s no valid data to compute from.

Conclusion

In conclusion, the var() method in Polars is a powerful tool for computing the variance of numeric columns in a DataFrame. By default, it calculates the sample variance, but you can adjust the degree of freedom (ddof) to calculate population variance as well.

Happy Learning!!

References