• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:14 mins read
You are currently viewing Select Pandas Columns Based on Condition

We can select columns based on single/multiple conditions using the pandas loc[] attribute. The DataFrame.loc[] attribute property is used to select rows and columns based on index/index labels from DataFrame. Pandas DataFrame is a two-dimensional tabular data structure with labeled axes. i.e. columns and rows. Selecting columns from DataFrame results in a new DataFrame containing only specified selected columns from the original DataFrame.

Advertisements

In this article, I will explain how to select columns based on single/multiple conditions using the pandas loc[] attribute and other options.

1. Quick Examples of Select Columns by Condition

If you are in a hurry, below are some quick examples of select columns by condition.


# Quick examples of select columns by condition

# Example 1 : Pass boolean value into loc[] &
# Get specified column
df1 = df.loc[: , [True, False, True, False]]

# Example 2 : Select column based on condition
col = (df == 1200).any()
df = df.loc[: , col]

# Example 3: Select columns based on multiple condition
col = ((df == 25000 ) & (df =='Pandas' )).any()
d1f = df.loc[: , col]

# Example 4: Get specified column 
# Using df[] notation 
# Along with specified condition
print(df[df.columns[df.iloc[0] == '30days']])

pandas DataFrame loc key Points

  • loc is used to select/filter rows and columns by labels.
  • When using to select rows, you need to provide row indices label.
  • It also provides a way to select rows and columns between ranges, every alternate e.t.c

pandas iloc[] is another property of DataFrame that is used to operate on column position and row indices. For a better understanding of these two learn the differences and similarities between pandas loc[] vs iloc[]

Now, let’s create a DataFrame with a few rows and columns and execute some examples of how to select columns based on conditions in pandas. Our DataFrame contains column names CoursesFeeDuration, and Discount.


# Create Pandas DataFrame
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark", "Pandas", "Python"],
    'Fee' :[20000, 25000, 22000, 24000],
    'Duration':['30days','40days', '35days', '45days'],
    'Discount':[1000, 2300, 1200, 1500]
              }
df = pd.DataFrame(technologies)
print(df)

# Output:
#   Courses    Fee Duration  Discount
# 0    Spark  20000   30days      1000
# 1  PySpark  25000   40days      2300
# 2   Pandas  22000   35days      1200
# 3   Python  24000   45days      1500

Yields below output.

select Pandas columns

2. Select Pandas Columns Based on Boolean Value

Using df.loc[] we can get the rows or columns of DataFrame based on the index. Here, I will pass boolean values in the column section of loc[] attribute where boolean values should be taken the same size of columns of given DataFrame. This syntax will return corresponding columns of DataFrame for every boolean value of True.


# Pass boolean value into loc[] & get specified column
df1 = df.loc[: , [True, False, True, False]]
print(df1)

# Output:
#    Courses Duration
# 0    Spark   30days
# 1  PySpark   40days
# 2   Pandas   35days
# 3   Python   45days

3. Select Pandas Columns Based on Single Conditions

We can get specified column/columns of a given Pandas DataFrame based on condition along with any() function and loc[] attribute. First, select a column using df==1200 condition, it will return the same sized DataFrame where elements are boolean values. If the value is True for the corresponding value of 1200 of the original DataFrame, otherwise it False.

Then, call any() function with Boolean DataFrame, and it will return the boolean Series where the values are all columns of DataFrame.

Finally, Pass the above boolean Series into the column section of the df.loc[] attribute, it will return the specified column of the given DataFrame.


# Select column based on condition
col = (df == 1200).any()
df = df.loc[: , col]
print(df)

# Output:
#    Discount
# 0      1000
# 1      2300
# 2      1200
# 3      1500

4. Select Pandas Columns Based on Multiple Conditions

Alternatively, Using the above syntax we can get specified columns based on multiple conditions. Let’s use and select the columns of the given DataFrame. For example,


# Select columns based on  multiple condition
col = ((df == 25000 ) & (df =='Pandas' )).any()
d1f = df.loc[: , col]
print(df1)

# Output:
#    Courses Duration
# 0    Spark   30days
# 1  PySpark   40days
# 2   Pandas   35days
# 3   Python   45days

5. Using df[] & Get Specified Column Based on Condition

When we want to select only columns based on some condition of a DataFrame, we can go with df[] notation, the best way to select the specified columns of DataFrame. df[df.columns[df.iloc[0]=='30days']] using this syntax we can select the specified columns based on condition.


# Get specified column using df[] notation 
# Along with specified condition
print(df[df.columns[df.iloc[0] == '30days']])

# Output: 
#   Duration
# 0   30days
# 1   40days
# 2   35days
# 3   45days

Frequently Asked Questions on Select Pandas Columns Based on Condition

How do I select columns in a Pandas DataFrame based on a specific condition?

To select columns in a Pandas DataFrame based on a specific condition, you can use boolean indexing. For example, the condition is to select columns where the sum of values in each column is greater than 10. The df.sum() > 10 creates a boolean mask, and df.columns[...] selects the column names that satisfy the condition. Finally, these selected columns are used to create a new DataFrame (result_df).

Can I select columns based on a condition involving multiple columns?

You can select columns based on a condition involving multiple columns using boolean indexing with logical operators.

How can I select columns based on a condition for the entire column (not just specific rows)?

If you want to select columns based on a condition for the entire column (not just specific rows), you can use boolean indexing directly on the DataFrame.

Can I select columns based on a condition involving string values?

You can select columns based on a condition involving string values in a Pandas DataFrame. You can use string methods or conditions to filter columns based on string content

Is it possible to select columns based on a condition and rename them?

It is possible to select columns based on a condition and rename them in a Pandas DataFrame. After selecting the columns, you can use the rename method to rename the columns as needed.

Conclusion

In this article, I have explained how to select columns based on single/multiple conditions using pandas loc[], iloc[] attributes, and df[] notation, with multiple examples

Related Articles

References

Vijetha

Vijetha is an experienced technical writer with a strong command of various programming languages. She has had the opportunity to work extensively with a diverse range of technologies, including Python, Pandas, NumPy, and R. Throughout her career, Vijetha has consistently exhibited a remarkable ability to comprehend intricate technical details and adeptly translate them into accessible and understandable materials. Follow me at Linkedin.