• Post author:
  • Post category:Pandas
  • Post last modified:May 27, 2024
  • Reading time:13 mins read
You are currently viewing Select Pandas Columns Based on Condition

We can select columns based on single/multiple conditions using the pandas loc[] attribute. The DataFrame.loc[] attribute property is used to select rows and columns based on index/index labels from DataFrame. A Pandas DataFrame is a two-dimensional table with labeled axes, including columns and rows. Selecting columns from a DataFrame creates a new DataFrame that contains only the specified columns from the original one.

Advertisements

In this article, I will explain how to select columns based on single/multiple conditions using the pandas loc[] attribute and other options.

Key Points –

  • Utilize boolean indexing to filter rows based on conditions.
  • Use logical operators (e.g., &, |) to combine multiple conditions.
  • Apply .loc[] or .iloc[] to select specific columns after filtering rows.
  • Leverage functions like .loc[] with conditions to select columns conditionally.
  • Consider using .query() for more complex conditions.

Quick Examples of Select Columns by Condition

Following are quick examples of select columns by condition.


# Quick examples of select columns by condition

# Example 1 : Pass boolean value into loc[] &
# Get specified column
df1 = df.loc[: , [True, False, True, False]]

# Example 2 : Select column based on condition
col = (df == 1200).any()
df = df.loc[: , col]

# Example 3: Select columns based on multiple condition
col = ((df == 25000 ) & (df =='Pandas' )).any()
d1f = df.loc[: , col]

# Example 4: Get specified column 
# Using df[] notation 
# Along with specified condition
print(df[df.columns[df.iloc[0] == '30days']])

Create Pandas DataFrame

To run some examples of select pandas columns based on condition, let’s create a Pandas DataFrame using data from a dictionary.


# Create Pandas DataFrame
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark", "Pandas", "Python"],
    'Fee' :[20000, 25000, 22000, 24000],
    'Duration':['30days','40days', '35days', '45days'],
    'Discount':[1000, 2300, 1200, 1500]
              }
df = pd.DataFrame(technologies)
print("Create DataFrame:\n",df)

Yields below output.

select pandas columns condition

Select Pandas Columns Based on Boolean Value

Using df.loc[] we can get the rows or columns of DataFrame based on the index. Here, I will pass boolean values in the column section of loc[] attribute where boolean values should be taken the same size of columns of given DataFrame. This syntax will return corresponding columns of DataFrame for every boolean value of True.


# Pass boolean value into loc[] & get specified column
df1 = df.loc[: , [True, False, True, False]]
print(df1)

# Output:
#    Courses Duration
# 0    Spark   30days
# 1  PySpark   40days
# 2   Pandas   35days
# 3   Python   45days

Select Pandas Columns Based on Single Conditions

We can get specified column/columns of a given Pandas DataFrame based on condition along with any() function and loc[] attribute. First, select a column using df==1200 condition, it will return the same sized DataFrame where elements are boolean values. If the value is True for the corresponding value of 1200 of the original DataFrame, otherwise it False.

Then, call any() function with Boolean DataFrame, and it will return the boolean Series where the values are all columns of DataFrame.

Finally, Pass the above boolean Series into the column section of the df.loc[] attribute, it will return the specified column of the given DataFrame.


# Select column based on condition
col = (df == 1200).any()
df = df.loc[: , col]
print(df)

# Output:
#    Discount
# 0      1000
# 1      2300
# 2      1200
# 3      1500

Select Pandas Columns Based on Multiple Conditions

Alternatively, Using the above syntax we can get specified columns based on multiple conditions. Let’s use and select the columns of the given DataFrame. For example,


# Select columns based on  multiple condition
col = ((df == 25000 ) & (df =='Pandas' )).any()
d1f = df.loc[: , col]
print(df1)

# Output:
#    Courses Duration
# 0    Spark   30days
# 1  PySpark   40days
# 2   Pandas   35days
# 3   Python   45days

Using df[] & Get Specified Column Based on Condition

When we want to select only columns based on some condition of a DataFrame, we can go with df[] notation, the best way to select the specified columns of DataFrame. df[df.columns[df.iloc[0]=='30days']] using this syntax we can select the specified columns based on condition.


# Get specified column using df[] notation 
# Along with specified condition
print(df[df.columns[df.iloc[0] == '30days']])

# Output: 
#   Duration
# 0   30days
# 1   40days
# 2   35days
# 3   45days

Frequently Asked Questions on Select Pandas Columns Based on Condition

How do I select columns in a Pandas DataFrame based on a specific condition?

To select columns in a Pandas DataFrame based on a specific condition, you can use boolean indexing. For example, the condition is to select columns where the sum of values in each column is greater than 10. The df.sum() > 10 creates a boolean mask, and df.columns[...] selects the column names that satisfy the condition. Finally, these selected columns are used to create a new DataFrame (result_df).

Can I select columns based on a condition involving multiple columns?

You can select columns based on a condition involving multiple columns using boolean indexing with logical operators.

How can I select columns based on a condition for the entire column (not just specific rows)?

If you want to select columns based on a condition for the entire column (not just specific rows), you can use boolean indexing directly on the DataFrame.

Can I select columns based on a condition involving string values?

You can select columns based on a condition involving string values in a Pandas DataFrame. You can use string methods or conditions to filter columns based on string content

Is it possible to select columns based on a condition and rename them?

It is possible to select columns based on a condition and rename them in a Pandas DataFrame. After selecting the columns, you can use the rename method to rename the columns as needed.

Conclusion

In this article, you have learned to select columns based on single/multiple conditions using pandas loc[], iloc[] attributes, and df[] notation, with multiple examples

Related Articles

References