To remove a column from a Polars DataFrame, you can use the drop()
method, which enables you to delete one or more columns. This function removes the specified column(s) and returns a new DataFrame without the dropped column. Alternatively, you can use the select()
method in combination with pl.exclude()
to achieve the same result. In this article, I will explain the different methods to drop or remove columns in a Polars DataFrame.
Key Points –
- Use
DataFrame.drop(column_name)
to directly drop a single column by its name. - Pass a list of column names to
DataFrame.drop([column1, column2])
to drop multiple columns at once. - Dropping a column in Polars does not modify the original DataFrame; it creates a new DataFrame.
- Use
DataFrame.select([columns_to_keep])
to explicitly retain specific columns and drop others. - Dropping columns can be seamlessly combined with other Polars operations, such as filtering or aggregations.
- Use Polars’
pl.exclude(column_name)
within aselect
to exclude specific columns dynamically. - Use loops or conditional logic with
select
ordrop
to dynamically determine which columns to remove. - Columns must be dropped by name, not by index, as Polars does not directly support index-based column removal.
Create a Polars DataFrame
Creating a Polars DataFrame is easy and versatile. You can initialize a DataFrame in various ways, including using dictionaries, lists of tuples, NumPy arrays, or Pandas DataFrames.
To run some examples of how to drop a column using Polars, let’s create a Polars DataFrame using data from a dictionary.
import polars as pl
# Creating a new Polars DataFrame
technologies = {
'Courses': ["Spark", "Pandas", "Hadoop", "Python", "Pandas", "Spark"],
'Fees': [22000, 26000, 25000, 20000, 26000, 22000],
'Duration': ['30days', '60days', '50days', '40days', '60days', '30days'],
'Discount': [1000, 2000, 1500, 1200, 2000, 1000]
}
df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)
Yields below output.
To remove a single column from a Polars DataFrame, you can use the drop() method by specifying the column name as an argument.
# Drop the "Discount" column
result = df.drop("Discount")
print(result)
In the above example, the "Discount"
column is removed from the DataFrame, and the remaining columns are "Courses"
, "Fees"
, and "Duration"
.
Using drop() Method for Multiple Columns
To remove multiple columns from a Polars DataFrame, you can use the drop()
method by passing a list of column names as an argument.
# Drop multiple columns "Discount" and "Duration"
result = df.drop(["Discount", "Duration"])
print(result)
# Output:
# shape: (6, 2)
┌─────────┬───────┐
│ Courses ┆ Fees │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════════╪═══════╡
│ Spark ┆ 22000 │
│ Pandas ┆ 26000 │
│ Hadoop ┆ 25000 │
│ Python ┆ 20000 │
│ Pandas ┆ 26000 │
│ Spark ┆ 22000 │
└─────────┴───────┘
In the above example, the "Discount"
and "Duration"
columns are removed from the DataFrame, and the remaining columns are "Courses"
and "Fees"
.
Using select() with Explicit Columns to Keep
You can remove columns from a Polars DataFrame by explicitly specifying the columns you want to keep using the select() method. This approach allows you to retain only the desired columns and automatically drop the others.
# Using select()
# To explicitly keep the "Courses" and "Fees" columns
result = df.select(["Courses", "Fees"])
print(result)
# Output:
# shape: (6, 2)
┌─────────┬───────┐
│ Courses ┆ Fees │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════════╪═══════╡
│ Spark ┆ 22000 │
│ Pandas ┆ 26000 │
│ Hadoop ┆ 25000 │
│ Python ┆ 20000 │
│ Pandas ┆ 26000 │
│ Spark ┆ 22000 │
└─────────┴───────┘
In the above example, by using select()
, only the "Courses"
and "Fees"
columns are kept in the resulting DataFrame. The "Duration"
and "Discount"
columns are excluded. This approach is especially useful when you want to explicitly choose which columns to retain.
Using select() with Exclusion via pl.exclude()
You can exclude specific columns from a Polars DataFrame by using pl.exclude()
within the select()
method. This allows you to dynamically exclude one or more columns while keeping all others.
# Using select() with pl.exclude
# To exclude the "Discount" column
result = df.select(pl.exclude("Discount"))
print(result)
# Output:
# shape: (6, 3)
┌─────────┬───────┬──────────┐
│ Courses ┆ Fees ┆ Duration │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str │
╞═════════╪═══════╪══════════╡
│ Spark ┆ 22000 ┆ 30days │
│ Pandas ┆ 26000 ┆ 60days │
│ Hadoop ┆ 25000 ┆ 50days │
│ Python ┆ 20000 ┆ 40days │
│ Pandas ┆ 26000 ┆ 60days │
│ Spark ┆ 22000 ┆ 30days │
└─────────┴───────┴──────────┘
In the above example, pl.exclude("Discount")
instructs Polars to exclude the "Discount"
column from the selection. The resulting DataFrame includes all columns except "Discount"
keeping the "Courses"
"Fees"
and "Duration"
columns.
Drop Columns by Index
To drop columns by index in Polars, you can use a combination of select()
and enumerate()
to filter out columns based on their index.
# Drop the column at index 1 (i.e., "Fees" column)
result = df.select([col for idx, col in enumerate(df.columns) if idx != 1])
print(result)
# Output:
# shape: (6, 3)
┌─────────┬──────────┬──────────┐
│ Courses ┆ Duration ┆ Discount │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞═════════╪══════════╪══════════╡
│ Spark ┆ 30days ┆ 1000 │
│ Pandas ┆ 60days ┆ 2000 │
│ Hadoop ┆ 50days ┆ 1500 │
│ Python ┆ 40days ┆ 1200 │
│ Pandas ┆ 60days ┆ 2000 │
│ Spark ┆ 30days ┆ 1000 │
└─────────┴──────────┴──────────┘
In the above example, enumerate(df.columns)
generates pairs of index and column names. The list comprehension then filters out the column at index 1 (which corresponds to "Fees"
), retaining all other columns. This produces a new DataFrame with the "Fees"
column removed.
Drop Columns Based on a Condition
To drop columns based on a condition in Polars, you can use the select()
method in combination with a list comprehension to filter out columns that meet a certain condition. For instance, you might want to drop columns that contain numerical values, or columns with a specific data type.
# Drop columns where the data type is numeric (e.g., i64, f64)
result = df.select([col for col in df.columns if not df[col].dtype in [pl.Int64, pl.Float64]])
print(result)
# Output:
# shape: (6, 2)
┌─────────┬──────────┐
│ Courses ┆ Duration │
│ --- ┆ --- │
│ str ┆ str │
╞═════════╪══════════╡
│ Spark ┆ 30days │
│ Pandas ┆ 60days │
│ Hadoop ┆ 50days │
│ Python ┆ 40days │
│ Pandas ┆ 60days │
│ Spark ┆ 30days │
└─────────┴──────────┘
In the above example, the condition not df[col].dtype in [pl.Int64, pl.Float64]
filters out the columns whose data type is either Int64
or Float64
. The resulting DataFrame contains only the "Courses"
and "Duration"
columns, while the "Fees"
and "Discount"
columns are excluded because they are numeric.
You can also apply a different condition, such as excluding string columns.
# Drop columns where the data type is string (str)
result = df.select([col for col in df.columns if not df[col].dtype == pl.Utf8])
print(result)
# Output:
# shape: (6, 2)
┌───────┬──────────┐
│ Fees ┆ Discount │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═══════╪══════════╡
│ 22000 ┆ 1000 │
│ 26000 ┆ 2000 │
│ 25000 ┆ 1500 │
│ 20000 ┆ 1200 │
│ 26000 ┆ 2000 │
│ 22000 ┆ 1000 │
└───────┴──────────┘
In the above example, The condition not df[col].dtype == pl.Utf8
filters out columns that are of type Utf8
(string type). The resulting DataFrame retains numeric columns (Fees
, Discount
) while excluding string columns (Courses
, Duration
).
Conclusion
In conclusion, dropping a column in Polars is a simple task with various flexible approaches. You can use the drop()
method, utilize select()
for explicit inclusion or exclusion, or work with column indices.
Happy Learning!!
Related Articles
- How to Drop Row in Polars
- Polars Cast Multiple Columns
- Polars DataFrame.sort() Method
- Polars DataFrame.unique() Function
- Polars DataFrame.explode() Method
- Convert Polars Cast String to Float
- Polars DataFrame.cast() Method with Examples
- Polars DataFrame.groupby() Explained With Examples