• Post author:
  • Post category:Polars
  • Post last modified:February 17, 2025
  • Reading time:12 mins read
You are currently viewing Polars DataFrame drop() Method

In Polars, the drop() method is used to remove one or more columns from a DataFrame. It offers a flexible and efficient approach to manipulating the structure of a DataFrame, allowing columns to be removed based on their names, patterns, or selectors. While this function is similar to the drop() function in other libraries like Pandas, it is specifically designed for use with Polars’ DataFrame.

Advertisements

In this article, I will cover the Polars DataFrame drop() method, including its syntax, parameters, and usage, as well as how it returns a new DataFrame with the specified columns removed.

Key Points –

  • The drop() method is used to remove one or more columns from a Polars DataFrame.
  • By default, drop() creates a new DataFrame with the specified columns removed. It does not modify the original DataFrame in place.
  • The strict parameter determines whether an error is raised if a column is not found. Setting strict=False prevents errors for non-existing columns.
  • You can drop all columns by passing df.columns, which will result in an empty DataFrame, retaining the row structure but with no columns.
  • When strict=True (default), dropping non-existing columns raises a ColumnNotFound error, but strict=False suppresses this behavior.
  • The drop() method is efficient, designed to work on large datasets, especially when combined with column selectors or regex-based patterns.
  • The drop() method is optimized for performance, making it suitable for working with large DataFrames.

Syntax of Polars DataFrame drop()

Let’s know the syntax of the Polars DataFrame drop() method.


# Syntax of drop()
DataFrame.drop(
    *columns: ColumnNameOrSelector | Iterable[ColumnNameOrSelector],
    strict: bool = True,
) -> DataFrame

Parameters of the Polars DataFrame.drop()

Following are the parameters of the drop() method.

  • *columns
    • One or more column names or selectors (either a string or an iterable of strings). This specifies the column(s) to be dropped from the DataFrame.
    • You can pass a single column name (as a string), a list of column names, or a selector (e.g., regular expressions for column names).
  • strict (bool, default True) –
    • If True, Polars will raise a KeyError if any of the specified columns do not exist in the DataFrame.
    • If False, the method will silently ignore any non-existent columns without raising an error.

Return Value

This function returns a new DataFrame with the specified columns removed. The original DataFrame is not modified.

Usage of Polars DataFrame drop() Method

The drop() method in Polars is used to remove one or more columns from a DataFrame. This operation returns a new DataFrame with the specified columns removed.

First, let’s create a Polars DataFrame.


import polars as pl

# Creating a new Polars DataFrame
technologies = {
    'Courses': ["Spark", "Hadoop", "Python", "Pandas"],
    'Fees': [22000, 25000, 20000, 26000],
    'Duration': ['30days', '50days', '40days', '60days'],
    'Discount': [1000, 1500, 1200, 2000]
}

df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)

Yields below output.

polars drop

To drop a single column from a Polars DataFrame, use the drop() method. Here’s how you can drop one column from your DataFrame.


# Dropping the 'Discount' column
df2 = df.drop("Discount")
print("DataFrame after dropping 'Discount':\n", df2)

Here,

  • The column "Discount" is removed from the DataFrame.
  • The original DataFrame (df) remains unchanged, as drop() returns a new DataFrame. You must assign it to a variable to save the changes.
polars drop

Dropping Multiple Columns

To drop multiple columns from a Polars DataFrame, use the drop() method with a list of column names.


# Dropping the 'Fees' and 'Duration' columns
df2 = df.drop(["Fees", "Duration"])
print("DataFrame after dropping 'Fees' and 'Duration':\n", df2)

# Output:
# DataFrame after dropping 'Fees' and 'Duration':
# shape: (4, 2)
┌─────────┬──────────┐
│ Courses ┆ Discount │
│ ---     ┆ ---      │
│ str     ┆ i64      │
╞═════════╪══════════╡
│ Spark   ┆ 1000     │
│ Hadoop  ┆ 1500     │
│ Python  ┆ 1200     │
│ Pandas  ┆ 2000     │
└─────────┴──────────┘

Here,

  • You can specify multiple column names as a list in the drop() method.
  • The method returns a new DataFrame with the specified columns removed.
  • The original DataFrame remains unchanged unless you explicitly overwrite it.

Ignoring Non-Existing Columns (with strict=False)

When using the drop() method to remove columns, you can handle non-existing columns more gracefully by setting the strict parameter to False. By default, strict=True, which raises an error if any of the specified columns are not found. However, setting strict=False allows the method to ignore columns that don’t exist and proceed without raising an error.


# Attempting to drop 'Fees' and a non-existing column 'NonExistent' with strict=False
df2 = df.drop(["Fees", "NonExistent"], strict=False)
print("DataFrame after dropping 'Fees' and ignoring non-existing columns:\n", df2)

# Output:
# DataFrame after dropping 'Fees' and ignoring non-existing columns:
# shape: (4, 3)
┌─────────┬──────────┬──────────┐
│ Courses ┆ Duration ┆ Discount │
│ ---     ┆ ---      ┆ ---      │
│ str     ┆ str      ┆ i64      │
╞═════════╪══════════╪══════════╡
│ Spark   ┆ 30days   ┆ 1000     │
│ Hadoop  ┆ 50days   ┆ 1500     │
│ Python  ┆ 40days   ┆ 1200     │
│ Pandas  ┆ 60days   ┆ 2000     │
└─────────┴──────────┴──────────┘

Here,

  • Default Behavior (strict=True): The method raises an error if you attempt to drop a column that doesn’t exist in the DataFrame.
  • Using strict=False: Allows the method to skip over any non-existing columns without throwing an error.
  • The method still removes any columns from the list that are present in the DataFrame. In this example, the “Fees” column is dropped while the “NonExistent” column is ignored.

Dropping All Columns (Leaving Empty DataFrame)

To drop all columns in a Polars DataFrame, effectively leaving an empty DataFrame with no columns, you can pass all column names to the drop() method.


# Dropping all columns from the DataFrame
df2 = df.drop(df.columns)
print("DataFrame after dropping all columns:\n", df2)

# Output:
# DataFrame after dropping all columns:
# shape: (0, 0)
┌┐
╞╡
└┘

Here,

  • df.columns: This returns a list of all column names in the DataFrame.
  • drop() with df.columns: When passing all column names, it removes all columns from the DataFrame.

Dropping Columns Using a List of Column Names

You can drop specific columns from a Polars DataFrame by passing a list of column names to the drop() method. This method will remove the columns corresponding to the names provided in the list.


# List of columns to drop
columns_to_drop = ["Fees", "Discount"]

# Dropping the specified columns
df2 = df.drop(columns_to_drop)
print("DataFrame after dropping columns using a list:\n", df2)

# Output:
# DataFrame after dropping columns using a list:
# shape: (4, 2)
┌─────────┬──────────┐
│ Courses ┆ Duration │
│ ---     ┆ ---      │
│ str     ┆ str      │
╞═════════╪══════════╡
│ Spark   ┆ 30days   │
│ Hadoop  ┆ 50days   │
│ Python  ┆ 40days   │
│ Pandas  ┆ 60days   │
└─────────┴──────────┘

Here,

  • The drop() method accepts a list of column names to remove multiple columns at once.
  • In this example, the "Fees" and "Discount" columns are dropped from the DataFrame.
  • The method returns a new DataFrame with the specified columns removed, while the original DataFrame (df) remains unchanged unless explicitly reassigned.

Conclusion

In this article, I have explained the Polars DataFrame drop() method, by using its syntax, parameters, usage, and how it returns a new DataFrame with the specified columns removed while leaving the original DataFrame unchanged.

Happy Learning!!

References