• Post author:
  • Post category:Polars
  • Post last modified:February 17, 2025
  • Reading time:13 mins read
You are currently viewing How to Drop Row in Polars

In Polars, you can drop rows using the filter() method, which lets you select rows that satisfy a particular condition. To remove specific rows, you apply a condition that excludes the rows you want to drop. Since Polars doesn’t provide a direct drop() method like pandas, you generally filter the rows based on certain conditions. Alternatively, operations like drop_nulls(), filter(), and unique() can also be used to remove unwanted rows. In this article, I will explain the different methods to drop rows in a Polars DataFrame.

Advertisements

Key Points –

  • Use filter() to remove rows based on specific conditions or criteria.
  • Use conditions like !=, >, <, etc., to specify which rows to drop.
  • Use drop_nulls() to eliminate rows containing missing (null) values in one or more columns.
  • Combine conditions with logical operators (&, |) to drop rows that meet multiple criteria.
  • Use the .unique() method to drop rows that are duplicates across all columns.
  • Use .is_in() to filter out rows based on a list of values.
  • You can drop rows if a column value meets a specific condition (e.g., greater than a threshold).
  • Polars operations like filter() do not modify the original DataFrame, so assign the result to a new variable or overwrite the existing one.

Create a Polars DataFrame

Creating a Polars DataFrame is simple and resembles the process of creating a DataFrame in Pandas. Polars offers a versatile API that allows you to build DataFrames from various data structures, including dictionaries, lists of lists, and NumPy arrays.

Let’s start by creating a basic DataFrame using Polars.


import polars as pl

# Creating a new Polars DataFrame
technologies = {
    'Courses': ["Spark", "Pandas", "Hadoop", "Python", "Pandas", "Spark"],
    'Fees': [22000, 26000, 25000, 20000, 26000, 22000],
    'Duration': ['30days', '60days', '50days', '40days', '60days', '30days'],
    'Discount': [1000, 200, 1500, 1200, 2000, 1000]
}

df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)

Yields below output.

polars drop row

Dropping Rows Based on Condition

In Polars, you can drop rows based on a condition using the filter() method. This method enables you to exclude rows that don’t satisfy the specified condition, effectively removing them from the DataFrame.

Drop Rows Where a Column Value is Equal to a Given Value

To drop rows where a column value is equal to a given value in Polars, you can use the filter() function and exclude the rows that match the specified condition.


# Drop rows where 'Courses' is "Pandas"
df2 = df.filter(pl.col("Courses") != "Pandas")
print(df2)

Here,

  • pl.col("Courses") != "Pandas" filters out rows where the Courses column is equal to “Pandas”
  • The filter() method keeps rows that meet the condition (not equal to “Pandas”).
polars drop row

Drop Rows Where a Numeric Column is Below a Threshold

To drop rows where a numeric column is below a threshold in Polars, you can use the filter() function and specify a condition that retains only the rows where the column value is greater than or equal to the threshold.


# Drop rows where 'Fees' is below 25000
df2 = df.filter(pl.col("Fees") >= 25000)
print(df2)

# Output:
# shape: (3, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fees  ┆ Duration ┆ Discount │
│ ---     ┆ ---   ┆ ---      ┆ ---      │
│ str     ┆ i64   ┆ str      ┆ i64      │
╞═════════╪═══════╪══════════╪══════════╡
│ Pandas  ┆ 26000 ┆ 60days   ┆ 2000     │
│ Hadoop  ┆ 25000 ┆ 50days   ┆ 1500     │
│ Pandas  ┆ 26000 ┆ 60days   ┆ 2000     │
└─────────┴───────┴──────────┴──────────┘

Here,

  • pl.col("Fees") >= 25000: This condition keeps rows where the value in the Fees column is greater than or equal to 25000.
  • The filter() function excludes rows that don’t meet this condition.
  • In this example, the rows where Fees was less than 25000 are dropped, and only the rows where the Fees are greater than or equal to 25000 are retained.

Drop Rows Based on Multiple Column Conditions

Dropping rows based on multiple column conditions in Polars involves using the filter() method along with logical operators such as & (AND), | (OR), and ~ (NOT).


# Drop rows where 'Courses' is "Pandas" or 'Fees' is greater than 25000
df2 = df.filter(~((pl.col("Courses") == "Pandas") | (pl.col("Fees") > 22000)))
print(df2)

# Output:
# shape: (3, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fees  ┆ Duration ┆ Discount │
│ ---     ┆ ---   ┆ ---      ┆ ---      │
│ str     ┆ i64   ┆ str      ┆ i64      │
╞═════════╪═══════╪══════════╪══════════╡
│ Spark   ┆ 22000 ┆ 30days   ┆ 1000     │
│ Python  ┆ 20000 ┆ 40days   ┆ 1200     │
│ Spark   ┆ 22000 ┆ 30days   ┆ 1000     │
└─────────┴───────┴──────────┴──────────┘

Here,

  • pl.col("Courses") == "Pandas" checks if the Courses column has the value “Pandas”.
  • pl.col("Fees") > 22000 checks if the Fees column is greater than 22000.
  • | combines these conditions with a logical OR, meaning either condition being true is enough to exclude the row.
  • ~ negates the condition to drop the rows that meet it.

Drop Rows Where a Column Value Exists in a Specific List

To drop rows where a column value exists in a specific list in Polars, you can use the .is_in() method along with a boolean negation (~).


# Define the list of values to exclude
values_to_exclude = ["Pandas", "Python", "Hadoop"]

# Drop rows where 'Courses' column value exists in the list
df2 = df.filter(~df['Courses'].is_in(values_to_exclude))
print(df2)

# Output:
# shape: (2, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fees  ┆ Duration ┆ Discount │
│ ---     ┆ ---   ┆ ---      ┆ ---      │
│ str     ┆ i64   ┆ str      ┆ i64      │
╞═════════╪═══════╪══════════╪══════════╡
│ Spark   ┆ 22000 ┆ 30days   ┆ 1000     │
│ Spark   ┆ 22000 ┆ 30days   ┆ 1000     │
└─────────┴───────┴──────────┴──────────┘

If you need to filter rows based on multiple columns being in specific lists, you can combine conditions.


# Define lists for multiple columns
courses_to_exclude = ["Pandas", "Python"]
fees_to_exclude = [26000, 30000]

# Drop rows where 'Courses' or 'Fees' match the respective lists
df2 = df.filter(~df['Courses'].is_in(courses_to_exclude) & ~df['Fees'].is_in(fees_to_exclude))
print(df2)

# Output:
# shape: (3, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fees  ┆ Duration ┆ Discount │
│ ---     ┆ ---   ┆ ---      ┆ ---      │
│ str     ┆ i64   ┆ str      ┆ i64      │
╞═════════╪═══════╪══════════╪══════════╡
│ Spark   ┆ 22000 ┆ 30days   ┆ 1000     │
│ Hadoop  ┆ 25000 ┆ 50days   ┆ 1500     │
│ Spark   ┆ 22000 ┆ 30days   ┆ 1000     │
└─────────┴───────┴──────────┴──────────┘

Dropping Duplicate Rows

To drop duplicate rows in Polars, you can use the unique() method. This method removes duplicate rows from the DataFrame based on all columns or a subset of columns.


# Drop all duplicate rows
df2 = df.unique()
print(df2)

# Output:
# shape: (4, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fees  ┆ Duration ┆ Discount │
│ ---     ┆ ---   ┆ ---      ┆ ---      │
│ str     ┆ i64   ┆ str      ┆ i64      │
╞═════════╪═══════╪══════════╪══════════╡
│ Hadoop  ┆ 25000 ┆ 50days   ┆ 1500     │
│ Python  ┆ 20000 ┆ 40days   ┆ 1200     │
│ Pandas  ┆ 26000 ┆ 60days   ┆ 2000     │
│ Spark   ┆ 22000 ┆ 30days   ┆ 1000     │
└─────────┴───────┴──────────┴──────────┘

If you want to consider only specific columns when identifying duplicates, you can pass the column names to the subset parameter.


# Drop duplicates based on the 'Courses' column
df2 = df.unique(subset=["Courses"])
print(df2)

# Output:
# shape: (4, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fees  ┆ Duration ┆ Discount │
│ ---     ┆ ---   ┆ ---      ┆ ---      │
│ str     ┆ i64   ┆ str      ┆ i64      │
╞═════════╪═══════╪══════════╪══════════╡
│ Hadoop  ┆ 25000 ┆ 50days   ┆ 1500     │
│ Spark   ┆ 22000 ┆ 30days   ┆ 1000     │
│ Python  ┆ 20000 ┆ 40days   ┆ 1200     │
│ Pandas  ┆ 26000 ┆ 60days   ┆ 2000     │
└─────────┴───────┴──────────┴──────────┘

Dropping Rows with Missing Values

To drop rows with missing values in Polars, you can use the drop_nulls() method. This method removes rows that contain null values in any or specific columns.


import polars as pl

# Create a Polars DataFrame with missing values
df = pl.DataFrame({
    'Courses': ["Spark", "Pandas", None, "Python", "Pandas", "Spark"],
    'Fees': [22000, 26000, 25000, None, 26000, 22000],
    'Duration': ['30days', None, '50days', '40days', '60days', '30days'],
    'Discount': [1000, 2000, None, 1200, 2000, 1000]
})

# Drop rows with missing values in any column
df2 = df.drop_nulls()
print(df2)

# Output:
# shape: (3, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fees  ┆ Duration ┆ Discount │
│ ---     ┆ ---   ┆ ---      ┆ ---      │
│ str     ┆ i64   ┆ str      ┆ i64      │
╞═════════╪═══════╪══════════╪══════════╡
│ Spark   ┆ 22000 ┆ 30days   ┆ 1000     │
│ Pandas  ┆ 26000 ┆ 60days   ┆ 2000     │
│ Spark   ┆ 22000 ┆ 30days   ┆ 1000     │
└─────────┴───────┴──────────┴──────────┘

If you want to drop rows where specific columns have null values, pass the column names to the subset parameter.


# Drop rows with missing values in 'Fees' and 'Duration' columns
df2 = df.drop_nulls(subset=['Fees', 'Duration'])
print(df2)

# Output:
# shape: (4, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fees  ┆ Duration ┆ Discount │
│ ---     ┆ ---   ┆ ---      ┆ ---      │
│ str     ┆ i64   ┆ str      ┆ i64      │
╞═════════╪═══════╪══════════╪══════════╡
│ Spark   ┆ 22000 ┆ 30days   ┆ 1000     │
│ null    ┆ 25000 ┆ 50days   ┆ null     │
│ Pandas  ┆ 26000 ┆ 60days   ┆ 2000     │
│ Spark   ┆ 22000 ┆ 30days   ┆ 1000     │
└─────────┴───────┴──────────┴──────────┘

Summary of Methods

ScenarioMethod Example
Drop rows by condition.filter()df.filter(df[‘Fees’] > 25000)
Drop rows by index.slice() or .with_row_count()df.slice(2, len(df)-2)
Drop duplicate rows.unique()df.unique()
Drop rows with missing values.drop_nulls()df.drop_nulls()
Drop rows where value in a list.is_in()df.filter(~df[‘Courses’].is_in(lst))
Drop rows based on string length.str.lengths()df.filter(df[‘Courses’].str.lengths())
Drop rows by compound conditionsLogical operators with .filter()df.filter((cond1) & (cond2))

Conclusion

In summary, dropping rows in Polars is a simple and flexible process that facilitates various data-cleaning tasks. With methods like drop_nulls(), filter(), and unique(), you can efficiently eliminate rows based on criteria such as null values, duplicates, or specific column conditions.

Happy Learning!!

References