In Polars, the drop()
method is used to remove one or more columns from a DataFrame. It offers a flexible and efficient approach to manipulating the structure of a DataFrame, allowing columns to be removed based on their names, patterns, or selectors. While this function is similar to the drop()
function in other libraries like Pandas, it is specifically designed for use with Polars’ DataFrame.
In this article, I will cover the Polars DataFrame drop()
method, including its syntax, parameters, and usage, as well as how it returns a new DataFrame with the specified columns removed.
Key Points –
- The
drop()
method is used to remove one or more columns from a Polars DataFrame. - By default,
drop()
creates a new DataFrame with the specified columns removed. It does not modify the original DataFrame in place. - The
strict
parameter determines whether an error is raised if a column is not found. Settingstrict=False
prevents errors for non-existing columns. - You can drop all columns by passing
df.columns
, which will result in an empty DataFrame, retaining the row structure but with no columns. - When
strict=True
(default), dropping non-existing columns raises aColumnNotFound
error, butstrict=False
suppresses this behavior. - The
drop()
method is efficient, designed to work on large datasets, especially when combined with column selectors or regex-based patterns. - The
drop()
method is optimized for performance, making it suitable for working with large DataFrames.
Syntax of Polars DataFrame drop()
Let’s know the syntax of the Polars DataFrame drop() method.
# Syntax of drop()
DataFrame.drop(
*columns: ColumnNameOrSelector | Iterable[ColumnNameOrSelector],
strict: bool = True,
) -> DataFrame
Parameters of the Polars DataFrame.drop()
Following are the parameters of the drop()
method.
*columns
–- One or more column names or selectors (either a string or an iterable of strings). This specifies the column(s) to be dropped from the DataFrame.
- You can pass a single column name (as a string), a list of column names, or a selector (e.g., regular expressions for column names).
strict
(bool
, defaultTrue
) –- If
True
, Polars will raise aKeyError
if any of the specified columns do not exist in the DataFrame. - If
False
, the method will silently ignore any non-existent columns without raising an error.
- If
Return Value
This function returns a new DataFrame with the specified columns removed. The original DataFrame is not modified.
Usage of Polars DataFrame drop() Method
The drop()
method in Polars is used to remove one or more columns from a DataFrame. This operation returns a new DataFrame with the specified columns removed.
First, let’s create a Polars DataFrame.
import polars as pl
# Creating a new Polars DataFrame
technologies = {
'Courses': ["Spark", "Hadoop", "Python", "Pandas"],
'Fees': [22000, 25000, 20000, 26000],
'Duration': ['30days', '50days', '40days', '60days'],
'Discount': [1000, 1500, 1200, 2000]
}
df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)
Yields below output.
To drop a single column from a Polars DataFrame, use the drop()
method. Here’s how you can drop one column from your DataFrame.
# Dropping the 'Discount' column
df2 = df.drop("Discount")
print("DataFrame after dropping 'Discount':\n", df2)
Here,
- The column
"Discount"
is removed from the DataFrame. - The original DataFrame (
df
) remains unchanged, asdrop()
returns a new DataFrame. You must assign it to a variable to save the changes.
Dropping Multiple Columns
To drop multiple columns from a Polars DataFrame, use the drop()
method with a list of column names.
# Dropping the 'Fees' and 'Duration' columns
df2 = df.drop(["Fees", "Duration"])
print("DataFrame after dropping 'Fees' and 'Duration':\n", df2)
# Output:
# DataFrame after dropping 'Fees' and 'Duration':
# shape: (4, 2)
┌─────────┬──────────┐
│ Courses ┆ Discount │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════════╪══════════╡
│ Spark ┆ 1000 │
│ Hadoop ┆ 1500 │
│ Python ┆ 1200 │
│ Pandas ┆ 2000 │
└─────────┴──────────┘
Here,
- You can specify multiple column names as a list in the
drop()
method. - The method returns a new DataFrame with the specified columns removed.
- The original DataFrame remains unchanged unless you explicitly overwrite it.
Ignoring Non-Existing Columns (with strict=False)
When using the drop()
method to remove columns, you can handle non-existing columns more gracefully by setting the strict
parameter to False
. By default, strict=True
, which raises an error if any of the specified columns are not found. However, setting strict=False
allows the method to ignore columns that don’t exist and proceed without raising an error.
# Attempting to drop 'Fees' and a non-existing column 'NonExistent' with strict=False
df2 = df.drop(["Fees", "NonExistent"], strict=False)
print("DataFrame after dropping 'Fees' and ignoring non-existing columns:\n", df2)
# Output:
# DataFrame after dropping 'Fees' and ignoring non-existing columns:
# shape: (4, 3)
┌─────────┬──────────┬──────────┐
│ Courses ┆ Duration ┆ Discount │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞═════════╪══════════╪══════════╡
│ Spark ┆ 30days ┆ 1000 │
│ Hadoop ┆ 50days ┆ 1500 │
│ Python ┆ 40days ┆ 1200 │
│ Pandas ┆ 60days ┆ 2000 │
└─────────┴──────────┴──────────┘
Here,
- Default Behavior (
strict=True
): The method raises an error if you attempt to drop a column that doesn’t exist in the DataFrame. - Using
strict=False
: Allows the method to skip over any non-existing columns without throwing an error. - The method still removes any columns from the list that are present in the DataFrame. In this example, the “Fees” column is dropped while the “NonExistent” column is ignored.
Dropping All Columns (Leaving Empty DataFrame)
To drop all columns in a Polars DataFrame, effectively leaving an empty DataFrame with no columns, you can pass all column names to the drop()
method.
# Dropping all columns from the DataFrame
df2 = df.drop(df.columns)
print("DataFrame after dropping all columns:\n", df2)
# Output:
# DataFrame after dropping all columns:
# shape: (0, 0)
┌┐
╞╡
└┘
Here,
df.columns
: This returns a list of all column names in the DataFrame.drop()
withdf.columns
: When passing all column names, it removes all columns from the DataFrame.
Dropping Columns Using a List of Column Names
You can drop specific columns from a Polars DataFrame by passing a list of column names to the drop()
method. This method will remove the columns corresponding to the names provided in the list.
# List of columns to drop
columns_to_drop = ["Fees", "Discount"]
# Dropping the specified columns
df2 = df.drop(columns_to_drop)
print("DataFrame after dropping columns using a list:\n", df2)
# Output:
# DataFrame after dropping columns using a list:
# shape: (4, 2)
┌─────────┬──────────┐
│ Courses ┆ Duration │
│ --- ┆ --- │
│ str ┆ str │
╞═════════╪══════════╡
│ Spark ┆ 30days │
│ Hadoop ┆ 50days │
│ Python ┆ 40days │
│ Pandas ┆ 60days │
└─────────┴──────────┘
Here,
- The
drop()
method accepts a list of column names to remove multiple columns at once. - In this example, the
"Fees"
and"Discount"
columns are dropped from the DataFrame. - The method returns a new DataFrame with the specified columns removed, while the original DataFrame (
df
) remains unchanged unless explicitly reassigned.
Conclusion
In this article, I have explained the Polars DataFrame drop()
method, by using its syntax, parameters, usage, and how it returns a new DataFrame with the specified columns removed while leaving the original DataFrame unchanged.
Happy Learning!!
Related Articles
- Polars DataFrame select() Method
- Polars Cast Multiple Columns
- Polars DataFrame.sort() Method
- Polars DataFrame.unique() Function
- Polars DataFrame.explode() Method
- Convert Polars Cast String to Float
- Polars DataFrame.melt() Method
- Polars DataFrame.join() Explained With Examples
- Polars DataFrame.cast() Method with Examples
- Polars Filter DataFrame with Multilple Conditions