• Post author:
  • Post category:Polars
  • Post last modified:May 9, 2025
  • Reading time:13 mins read
You are currently viewing How to use isin in Polars DataFrame?

In Polars, the isin() function is used to filter rows by checking if the values in a column are present in a specified list, Series, or another column, similar to the SQL IN clause or Pandas’ isin(). It returns a Boolean mask that indicates whether each value in the column matches any of the values in the given list or Series. In this article, I will explain how to use isin in Polars DataFrame.

Advertisements

Key Points –

  • isin() is used to check if values in a column exist in a provided list, Series, or another column.
  • It returns a Boolean mask indicating whether each element in the column matches any value from the list or Series.
  • isin() works with various data types, including strings, numbers, and Booleans.
  • You can combine isin() with the filter() function to select rows based on matching values.
  • The isin() method can be used with numeric columns, just like with string columns, to filter rows by numbers.
  • Logical operators like &, |, and ~ can be used to combine isin() with other conditions for more complex filtering.
  • isin() can be used inside with_columns() to create new columns based on membership conditions.
  • To exclude values, you can negate isin() using the ~ (tilde) operator.
  • isin() can be used on Boolean columns to filter or mark rows based on True/False values.

Usage of isin in Polars DataFrame

The isin() method in Polars is used to check whether the values in a column exist in a specified list or Series. It returns a Boolean mask (True/False) that you can use to filter or select rows where the values in the specified column are in the given set.

To run some examples of how to use isin in Polars DataFrame, let’s create a Polars DataFrame.


import polars as pl

# Creating a new Polars DataFrame
technologies= {
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
    'Fees' :[22000,25000,20000,24000,26000],
    'Duration':['30days','50days','40days','50days','40days'],
    'Discount':[1000,2300,1500,1200,2500]
}

df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)

Yields below output.

To filter rows with matching values in a Polars DataFrame, use the filter() function along with the isin() method. The isin() method checks whether values in a column are present in a given list or Series and returns a Boolean mask, which can then be used to select the matching rows.


# Filter: only keep rows where 'Courses' is 'Spark' or 'Python'
df2 = df.filter(pl.col("Courses").is_in(["Spark", "Python"]))
print("Filtered DataFrame:\n", df2)

Here,

  • pl.col("Courses"): selects the "Courses" column.
  • is_in(["Spark", "Python"]): checks if each value in that column is in the list ["Spark", "Python"].
  • df.filter(...): keeps only the rows where the condition is True.

Use isin() with Numeric Column

You can also apply the isin() method to numeric columns in Polars to filter rows based on whether the values in the column match any value in a given list of numbers. The process is the same as with string columns; simply pass a list of values and filter for rows where the numeric values are present in the list.


# Filter rows where 'Fees' is either 25000 or 24000
df2 = df.filter(pl.col("Fees").is_in([25000, 24000]))
print("Filtered DataFrame:\n", df2)

# Output:
# Filtered DataFrame:
# shape: (2, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fees  ┆ Duration ┆ Discount │
│ ---     ┆ ---   ┆ ---      ┆ ---      │
│ str     ┆ i64   ┆ str      ┆ i64      │
╞═════════╪═══════╪══════════╪══════════╡
│ PySpark ┆ 25000 ┆ 50days   ┆ 2300     │
│ Python  ┆ 24000 ┆ 50days   ┆ 1200     │
└─────────┴───────┴──────────┴──────────┘

Here,

  • pl.col("Fees").is_in([25000, 24000]): This checks if each value in the Fees column is either 25000 or 24000.
  • The filter() method keeps only the rows where the condition is True.

Combine with Multiple Conditions

You can combine multiple conditions in Polars using logical operators: & for AND, | for OR, and ~ for NOT. Each condition is written as an expression, such as pl.col("column") == value, pl.col("column").is_in([...]), or using comparison operators like >, <, and so on.


# Combine conditions with AND (&)
df2 = df.filter(
    (pl.col("Fees").is_in([25000, 24000])) & (pl.col("Discount") > 2000)
)
print("Filtered DataFrame:\n", df2)

# Output:
# Filtered DataFrame:
# shape: (1, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fees  ┆ Duration ┆ Discount │
│ ---     ┆ ---   ┆ ---      ┆ ---      │
│ str     ┆ i64   ┆ str      ┆ i64      │
╞═════════╪═══════╪══════════╪══════════╡
│ PySpark ┆ 25000 ┆ 50days   ┆ 2300     │
└─────────┴───────┴──────────┴──────────┘

Here,

  • (pl.col("Fees").is_in([25000, 24000])): filters rows where Fees match 25000 or 24000.
  • (pl.col("Discount") > 2000): filters rows where Discount is above 2000.
  • &: ensures both conditions are True.

Using isin() inside a with_columns() Expression

You can use isin() within a with_columns() expression to create a new column that indicates whether each value in a specific column exists in a given list. This allows you to add a column based on membership conditions.


# Add a new column using isin() inside with_columns()
df2 = df.with_columns(
    (pl.col("Courses").is_in(["Python", "Pandas"])).alias("Is_Popular")
)
print("DataFrame with 'Is_Popular':\n", df2)

# Output:
# DataFrame with 'Is_Popular':
# shape: (5, 5)
┌─────────┬───────┬──────────┬──────────┬────────────┐
│ Courses ┆ Fees  ┆ Duration ┆ Discount ┆ Is_Popular │
│ ---     ┆ ---   ┆ ---      ┆ ---      ┆ ---        │
│ str     ┆ i64   ┆ str      ┆ i64      ┆ bool       │
╞═════════╪═══════╪══════════╪══════════╪════════════╡
│ Spark   ┆ 22000 ┆ 30days   ┆ 1000     ┆ false      │
│ PySpark ┆ 25000 ┆ 50days   ┆ 2300     ┆ false      │
│ Hadoop  ┆ 20000 ┆ 40days   ┆ 1500     ┆ false      │
│ Python  ┆ 24000 ┆ 50days   ┆ 1200     ┆ true       │
│ Pandas  ┆ 26000 ┆ 40days   ┆ 2500     ┆ true       │
└─────────┴───────┴──────────┴──────────┴────────────┘

Here,

  • pl.col("Courses").is_in(["Python", "Pandas"]): Returns a boolean Series indicating which rows match.
  • alias("Is_Popular"): Names the new column.
  • with_columns(...): Adds it to the DataFrame.

Negate isin() to Exclude Values

To negate isin() in Polars and exclude specific values, you can use the ~ (tilde) operator. This will filter out the rows where the condition is not true.


# Exclude specific courses
excluded_courses = ["Python", "Pandas"]

df2 = df.filter(
    ~pl.col("Courses").is_in(excluded_courses)
)
print("Courses excluding Python and Pandas:\n", df2)

# Output:
# Courses excluding Python and Pandas:
# shape: (3, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fees  ┆ Duration ┆ Discount │
│ ---     ┆ ---   ┆ ---      ┆ ---      │
│ str     ┆ i64   ┆ str      ┆ i64      │
╞═════════╪═══════╪══════════╪══════════╡
│ Spark   ┆ 22000 ┆ 30days   ┆ 1000     │
│ PySpark ┆ 25000 ┆ 50days   ┆ 2300     │
│ Hadoop  ┆ 20000 ┆ 40days   ┆ 1500     │
└─────────┴───────┴──────────┴──────────┘

Here,

  • pl.col("Courses").is_in([...]): Returns a boolean mask for matching rows.
  • ~: Inverts the mask, returning rows NOT in the list.

Using isin() on Boolean Columns

The isin() method also works with Boolean columns in Polars. It’s useful for filtering, marking, or manipulating rows based on whether a True or False value matches any value in a given list. This is particularly helpful when working with columns that contain Boolean values and you want to filter or tag specific conditions.


import polars as pl

# Sample DataFrame with a boolean column
data = {
    'Courses': ["Spark", "PySpark", "Hadoop", "Python"],
    'Fees' :[22000,25000,20000,26000],
    "Is_Active": [True, False, True, False]
}

df = pl.DataFrame(data)

# Filter where Is_Active is in [True]
df2 = df.filter(pl.col("Is_Active").is_in([True]))
print("Filtered where Is_Active is True:\n", df2)

# Output:
# Filtered where Is_Active is True:
# shape: (2, 3)
┌─────────┬───────┬───────────┐
│ Courses ┆ Fees  ┆ Is_Active │
│ ---     ┆ ---   ┆ ---       │
│ str     ┆ i64   ┆ bool      │
╞═════════╪═══════╪═══════════╡
│ Spark   ┆ 22000 ┆ true      │
│ Hadoop  ┆ 20000 ┆ true      │
└─────────┴───────┴───────────┘

Here,

  • pl.col("Is_Active").is_in([True]): Filters only rows where "Is_Active" is True.
  • You can also do is_in([False]) or is_in([True, False]) (which returns all rows).

Conclusion

In conclusion, the isin() method in Polars is a powerful tool for filtering, marking, and manipulating rows based on whether column values match any value in a specified list, Series, or another column. Whether you’re working with strings, numbers, or Boolean values, isin() enables efficient membership checks, which can be combined with logical operators or used within expressions like with_columns() for enhanced data manipulation.

Happy Learning!!

Reference