To delete or remove a column from a Polars DataFrame, you can use the drop()
method or select().exclude()
by specifying the column name. This operation creates a new DataFrame that excludes the specified column(s), leaving the original DataFrame unchanged.
Deleting a column from a Polars DataFrame means removing one or more columns from the table of data, resulting in a new DataFrame that no longer includes those columns. Since Polars DataFrames are immutable, this operation does not modify the original DataFrame but instead returns a new DataFrame without the specified column(s). In this article, I will explain the different methods to delete or drop a column from a polars DataFrame.
Key Points –
- Polars DataFrames are immutable; deleting a column returns a new DataFrame without modifying the original.
- The primary method to delete columns is
drop()
, which accepts single or multiple column names. - The
select()
method combined withexclude()
allows dropping columns by excluding them from the selection. - Polars provides selectors like
pl.selectors.numeric()
andpl.selectors.strings()
to drop columns based on their data types. - Multiple columns can be dropped simultaneously by passing a list of column names to
drop()
. - The
select().exclude()
pattern provides an alternative way to drop columns by selecting all except specified ones. pl.exclude() inside select()
can take one or more column names to exclude from selection.- Dropping columns does not modify the DataFrame in place; reassignment is necessary to keep changes.
Usage of Polars Delete a Column from DataFrame
Deleting a column from a Polars DataFrame means removing that column entirely so it no longer appears in the DataFrame. Since Polars DataFrames are immutable by default, deleting a column returns a new DataFrame without the specified column(s).
Now, let’s create a Polars DataFrame.
import polars as pl
# Creating a dictionary with course information
technologies = {
'Courses': ["Spark", "Pandas", "Hadoop", "Python"],
'Fees': [30000, 40000, 35000, 60000],
'Duration': ['30days', '60days', '50days', '40days'],
'Discount': [1000, 2000, 1500, 3000]
}
# Creating a Polars DataFrame from the dictionary
df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)
Yields below output.
To remove a single column by name from a Polars DataFrame, you can use the drop()
method and pass the column name as a string.
# Drop a single column by name
df2 = df.drop("Duration")
print("DataFrame after dropping 'Duration' column:\n", df2)
Yields below output.
Drop All Numeric Columns Using a Selector
To drop all numeric columns in a Polars DataFrame using a selector, you can use the select()
or drop()
methods combined with Polars’ pl.col()
and datatype selectors.
# Drop all numeric columns
df2 = df.drop(pl.col(pl.NUMERIC_DTYPES))
print("DataFrame after dropping all numeric columns:\n", df2)
# Output:
# DataFrame after dropping all numeric columns:
# shape: (4, 2)
┌─────────┬──────────┐
│ Courses ┆ Duration │
│ --- ┆ --- │
│ str ┆ str │
╞═════════╪══════════╡
│ Spark ┆ 30days │
│ Pandas ┆ 60days │
│ Hadoop ┆ 50days │
│ Python ┆ 40days │
└─────────┴──────────┘
Here,
pl.NUMERIC_DTYPES
is a Polars constant that includes all numeric data types.pl.col(pl.NUMERIC_DTYPES)
selects all columns whose data types are numeric.df.drop()
drops those selected columns.
Similarly, to drop all numeric columns from a Polars DataFrame using a selector.
# Drop all numeric columns (Fees and Discount)
df2 = df.drop(pl.selectors.numeric())
print("DataFrame after dropping all numeric columns:\n", df2)
# Output:
# DataFrame after dropping all numeric columns:
# shape: (4, 2)
┌─────────┬──────────┐
│ Courses ┆ Duration │
│ --- ┆ --- │
│ str ┆ str │
╞═════════╪══════════╡
│ Spark ┆ 30days │
│ Pandas ┆ 60days │
│ Hadoop ┆ 50days │
│ Python ┆ 40days │
└─────────┴──────────┘
Here,
pl.selectors.numeric()
selects all columns with numeric types (like integers or floats).df.drop(...)
removes them and returns a new DataFrame without those columns.
Drop Columns Using select().exclude() (Single Column)
To drop a single column in Polars, you can use select().exclude()
to select all columns except the one you want to remove. This approach is useful when you want to keep most columns but exclude specific ones.
# Drop columns Using select().exclude()
df2 = df.select(pl.all().exclude("Duration"))
print("DataFrame after dropping 'Duration' using select().exclude():\n", df2)
# Drop the 'Duration' column using select().exclude()
df2 = df.select(pl.exclude("Duration"))
print("DataFrame after dropping 'Duration' using select().exclude():\n", df2)
# Output:
# DataFrame after dropping 'Duration' using select().exclude():
# shape: (4, 3)
┌─────────┬───────┬──────────┐
│ Courses ┆ Fees ┆ Discount │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════════╪═══════╪══════════╡
│ Spark ┆ 30000 ┆ 1000 │
│ Pandas ┆ 40000 ┆ 2000 │
│ Hadoop ┆ 35000 ┆ 1500 │
│ Python ┆ 60000 ┆ 3000 │
└─────────┴───────┴──────────┘
Here,
pl.all()
selects all columns.exclude("Duration")
excludes the column named"Duration"
from the selection.select(...)
creates a new DataFrame with the selected columns (i.e., all except"Duration"
).
Drop Columns Using select().exclude() (Multiple Columns)
To drop multiple columns in Polars using select().exclude()
, pass a list of column names to pl.exclude()
.
# Drop multiple columns using select().exclude()
df2 = df.select(pl.all().exclude(["Duration", "Discount"]))
print("DataFrame after dropping 'Discount' and 'Duration' columns:\n", df2)
# Output:
# DataFrame after dropping 'Discount' and 'Duration' columns:
shape: (4, 2)
┌─────────┬───────┐
│ Courses ┆ Fees │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════════╪═══════╡
│ Spark ┆ 30000 │
│ Pandas ┆ 40000 │
│ Hadoop ┆ 35000 │
│ Python ┆ 60000 │
└─────────┴───────┘
Here,
pl.all()
selects all columns.exclude(["Duration", "Discount"])
excludes both"Duration"
and"Discount"
columns.select()
returns a new DataFrame with only the remaining columns.
Conclusion
In summary, Polars provides flexible and efficient ways to drop columns, whether single or multiple, using methods like drop()
and select().exclude()
. By leveraging these techniques, you can easily manipulate your DataFrame to keep only the data you need.
Happy Learning!!
Related Articles
- Polars DataFrame Drop Nans
- Add Row of Column Totals in Polars
- Polars String Manipulation of Cell Contents
- Polars Replace String in Multiple Columns
- Polars DataFrame Columns Selection
- Polars Adding Days to a Date
- How to use isin in Polars DataFrame?
- Retrieve Date from DateTime Column in Polars
- How to Effectively Create Duplicate Rows in Polars?
- Polars DataFrame with_columns() Function with Examples
- How to Get the Name of an Aliased Expression in Polars?