• Post author:
  • Post category:Polars
  • Post last modified:May 12, 2025
  • Reading time:10 mins read
You are currently viewing Polars DataFrame Columns Selection

Polars DataFrame column selection is the process of extracting or working with specific columns from a DataFrame. A DataFrame in Polars is a table-like structure consisting of rows and columns, and column selection enables you to focus on a particular subset of these columns for analysis or transformation. This operation helps isolate and manipulate the desired columns for further processing. In this article, I will explain the Polars DataFrame columns selection.

Advertisements

Key Points –

  • You can select columns by name using the select() method, passing column names as a list.
  • Polars allows you to select columns by their position using slicing syntax within the select() method.
  • You can apply transformations or calculations on columns while selecting them using expressions like pl.col().
  • Columns can be selected based on their data type using the pl.col() function combined with specific data type identifiers (e.g., pl.Int64, pl.Str).
  • Conditions can be applied within the select() method to filter and transform data during selection.
  • You can perform transformations or computations on columns during selection using expressions like pl.col("col") + 1

Using the select() Method

The select() method is one of the most flexible ways to select columns from a Polars DataFrame. It allows you to specify one or more columns, and it returns a new DataFrame containing just those columns.

Usage of Polars DataFrame Columns Selection

Polars is a fast, memory-efficient DataFrame library that allows for flexible and powerful column selection. Selecting columns in a DataFrame is a crucial operation in data manipulation as it allows you to focus on the data you need, apply transformations, and optimize performance.

First, let’s create a Polars DataFrame.


import polars as pl

# Creating a new Polars DataFrame
technologies = {
    'Courses': ["Spark", "Hadoop", "Python", "Pandas"],
    'Fees': [20000, 25000, 30000, 40000],
    'Duration': ['30days', '50days', '40days', '60days'],
    'Discount': [1000, 1500, 1200, 2000]
}

df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)

Yields below output.

polars dataframe columns selection

To select specific columns by name in a Polars DataFrame, you can pass a list of column names to the select() method. For instance, to retrieve only the "Courses" and "Fees" columns.


# Selecting specific columns by name
df2 = df.select(['Courses', 'Fees'])
print("Selected Columns DataFrame:\n", df2)

Yields below output.

polars dataframe columns selection

Select a Single Column

To select a single column from a Polars DataFrame, you can use the select() method with the column name provided as a string inside a list. Alternatively, you can access the column directly using the DataFrame’s bracket notation.


# Selecting a single column by name
df2 = df.select(['Courses'])
print("Single Column DataFrame:\n", df2)

# Selecting a single column directly
df2 = df['Courses']
print("Single Column:\n", df2)

# Output:
# Single Column DataFrame:
# shape: (4, 1)
┌─────────┐
│ Courses │
│ ---     │
│ str     │
╞═════════╡
│ Spark   │
│ Hadoop  │
│ Python  │
│ Pandas  │
└─────────┘

Select Columns by Data Type

You can select columns based on their data type using the select() method along with the pl.col() function to filter columns by type. You can access columns of a specific type by using conditions like pl.Int64, pl.Float64, pl.Str, etc.


# Selecting columns of type String (Str)
df2 = df.select([pl.col(pl.Utf8)])
print("String Columns DataFrame:\n", df2)

# Output:
# String Columns DataFrame:
# shape: (4, 2)
┌─────────┬──────────┐
│ Courses ┆ Duration │
│ ---     ┆ ---      │
│ str     ┆ str      │
╞═════════╪══════════╡
│ Spark   ┆ 30days   │
│ Hadoop  ┆ 50days   │
│ Python  ┆ 40days   │
│ Pandas  ┆ 60days   │
└─────────┴──────────┘

To select columns of type Integer (Int64) in Polars, you can use the pl.col() function combined with the data type pl.Int64 inside the select() method.


# Select columns of type Int64 (i.e., integers)
df2 = df.select(pl.col(pl.Int64))
print("Integer Columns DataFrame:\n", df2)

# Output:
# Integer Columns DataFrame:
# shape: (4, 2)
┌───────┬──────────┐
│ Fees  ┆ Discount │
│ ---   ┆ ---      │
│ i64   ┆ i64      │
╞═══════╪══════════╡
│ 20000 ┆ 1000     │
│ 25000 ┆ 1500     │
│ 30000 ┆ 1200     │
│ 40000 ┆ 2000     │
└───────┴──────────┘

Select Ccolumn Range by Position

You can select a range of columns by their position using slicing syntax within the select() method in Polars.


# Select a range of columns by position (e.g., first three columns)
df2 = df.select(df.columns[0:3])
print("Selected Column Range (0 to 2):\n", df2)

# Output:
# Selected Column Range (0 to 2):
# shape: (4, 3)
┌─────────┬───────┬──────────┐
│ Courses ┆ Fees  ┆ Duration │
│ ---     ┆ ---   ┆ ---      │
│ str     ┆ i64   ┆ str      │
╞═════════╪═══════╪══════════╡
│ Spark   ┆ 20000 ┆ 30days   │
│ Hadoop  ┆ 25000 ┆ 50days   │
│ Python  ┆ 30000 ┆ 40days   │
│ Pandas  ┆ 40000 ┆ 60days   │
└─────────┴───────┴──────────┘

Select Columns using Expressions

You can select columns using expressions, which allows you to apply transformations, calculations, or conditions directly within the select() method. Expressions provide a powerful way to create new columns or filter existing columns based on complex operations.

Select Columns Based on Expressions

Let’s say you want to select columns where the Fees are greater than 25000.


# Selecting columns where Fees > 25000
df2= df.select([pl.col('Fees').filter(pl.col('Fees') > 25000)])
print("Filtered DataFrame (Fees > 25000):\n", df2)

# Output:
# Filtered DataFrame (Fees > 25000):
# shape: (2, 1)
┌───────┐
│ Fees  │
│ ---   │
│ i64   │
╞═══════╡
│ 30000 │
│ 40000 │
└───────┘

You can combine multiple columns and expressions. For example, let’s select Courses and apply an expression that converts Duration to a numeric value (e.g., extracting the number of days from the string):


# Extracting the number of days from the Duration column and selecting Courses
df2 = df.select([
    pl.col('Courses'),
    pl.col('Duration').str.extract(r'(\d+)').cast(pl.Int64).alias('Duration_Days')
])
print("DataFrame with Duration in Days:\n", df2)

# Output:
# DataFrame with Duration in Days:
# shape: (4, 2)
┌─────────┬───────────────┐
│ Courses ┆ Duration_Days │
│ ---     ┆ ---           │
│ str     ┆ i64           │
╞═════════╪═══════════════╡
│ Spark   ┆ 30            │
│ Hadoop  ┆ 50            │
│ Python  ┆ 40            │
│ Pandas  ┆ 60            │
└─────────┴───────────────┘

Conclusion

In conclusion, selecting columns in Polars is both intuitive and powerful. Whether you’re accessing them by name, position, data type, or using expressions for transformation, Polars offers a concise and efficient syntax to handle all your column selection needs. This flexibility makes it an excellent choice for fast and expressive data manipulation.

Happy Learning!!

Reference