Select Pandas Columns Based on Condition

We can select columns based on single/multiple conditions using the pandas loc[] attribute. The DataFrame.loc[] attribute property is used to select rows and columns based on index/index labels from DataFrame. Pandas DataFrame is a two-dimensional tabular data structure with labeled axes. i.e. columns and rows. Selecting columns from DataFrame results in a new DataFrame containing only specified selected columns from the original DataFrame.

In this article, I will explain how to select columns based on single/multiple conditions using the pandas loc[] attribute and other options.

1. Quick Examples of Pandas Select Columns by Condition

Below are some quick examples of select columns by condition


# Below are the quick examples.
# Example 1 : Pass boolean value into loc[] &
# get specified column
df1 = df.loc[: , [True, False, True, False]]

# Example 2 : Select column based on condition
col = (df == 1200).any()
df = df.loc[: , col]

# Example 3: Select columns based on  multiple condition
col = ((df == 25000 ) & (df =='Pandas' )).any()
d1f = df.loc[: , col]

# Example 4: # Get specified column using df[] notation 
# along with specified condition
print(df[df.columns[df.iloc[0] == '30days']])

pandas DataFrame loc key Points

  • loc is used to select/filter rows and columns by labels.
  • When using to select rows, you need to provide row indices label.
  • It also provides a way to select rows and columns between ranges, every alternate e.t.c

pandas iloc[] is another property of DataFrame that is used to operate on column position and row indices. For a better understanding of these two learn the differences and similarities between pandas loc[] vs iloc[]

Now, let’s create a DataFrame with a few rows and columns and execute some examples of how to select columns based on condition in pandas. Our DataFrame contains column names CoursesFeeDuration, and Discount.


# Create Pandas DataFrame
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark", "Pandas", "Python"],
    'Fee' :[20000, 25000, 22000, 24000],
    'Duration':['30days','40days', '35days', '45days'],
    'Discount':[1000, 2300, 1200, 1500]
              }
df = pd.DataFrame(technologies)
print(df)

# Output:
#   Courses    Fee Duration  Discount
# 0    Spark  20000   30days      1000
# 1  PySpark  25000   40days      2300
# 2   Pandas  22000   35days      1200
# 3   Python  24000   45days      1500

2. Select Pandas Columns based on Boolean Value

Using df.loc[] we can get the rows or columns of DataFrame based on the index. Here, I will pass boolean values in the column section of loc[] attribute where boolean values should be taken the same size of columns of given DataFrame. This syntax will return corresponding columns of DataFrame for every boolean value of True.


# Pass boolean value into loc[] & get specified column
df1 = df.loc[: , [True, False, True, False]]
print(df1)

# Output:
#    Courses Duration
# 0    Spark   30days
# 1  PySpark   40days
# 2   Pandas   35days
# 3   Python   45days

3. Select Pandas Columns based on Single Conditions

We can get specified column/columns of a given Pandas DataFrame based on condition along with any() function and loc[] attribute. First, select a column using df == 1200 condition, it will return the same sized DataFrame where elements are boolean values. If the value is True for the corresponding value of 1200 of the original DataFrame, otherwise it False.

Then, call any() function with Boolean dataframe, and it will return the boolean Series where the values are all columns of DataFrame.

Finally, Pass the above boolean Series into the column section of the df.loc[] attribute, it will return the specified column of the given DataFrame.


# Select column based on condition
col = (df == 1200).any()
df = df.loc[: , col]
print(df)

# Output:
#    Discount
# 0      1000
# 1      2300
# 2      1200
# 3      1500

4. Select Pandas Columns based on Multiple Conditions

Alternatively, Using the above syntax we can get specified columns based on multiple conditions. Let’s use and select the columns of the given DataFrame. For example,


# Select columns based on  multiple condition
col = ((df == 25000 ) & (df =='Pandas' )).any()
d1f = df.loc[: , col]
print(df1)

# Output:
#    Courses Duration
# 0    Spark   30days
# 1  PySpark   40days
# 2   Pandas   35days
# 3   Python   45days

5. Using df[] & Get Specified Column Based on Condition

When we want to select only columns based on some condition of a DataFrame, we can go with df[] notation, the best way to select the specified columns of DataFrame. df[df.columns[df.iloc[0] == '30days']] using this syntax we can select the specified columns based on condition.


# Get specified column using df[] notation 
# along with specified condition
print(df[df.columns[df.iloc[0] == '30days']])

# Output: 
#   Duration
# 0   30days
# 1   40days
# 2   35days
# 3   45days

6. Conclusion

In this article, I have explained how to select columns based on single/multiple conditions using pandas loc[], iloc[] attributes, and df[] notation, with multiple examples.

Select pandas columns based on condition

Related Articles

References

Leave a Reply