We can select columns based on single/multiple conditions using the pandas loc[]
attribute. The DataFrame.loc[] attribute property is used to select rows and columns based on index/index labels from DataFrame. A Pandas DataFrame is a two-dimensional table with labeled axes, including columns and rows. Selecting columns from a DataFrame creates a new DataFrame that contains only the specified columns from the original one.
In this article, I will explain how to select columns based on single/multiple conditions using the pandas loc[]
attribute and other options.
Key Points –
- Utilize boolean indexing to filter rows based on conditions.
- Use logical operators (e.g., &, |) to combine multiple conditions.
- Apply
.loc[]
or.iloc[]
to select specific columns after filtering rows. - Leverage functions like
.loc[]
with conditions to select columns conditionally. - Consider using
.query()
for more complex conditions.
Quick Examples of Select Columns by Condition
Following are quick examples of select columns by condition.
# Quick examples of select columns by condition
# Example 1 : Pass boolean value into loc[] &
# Get specified column
df1 = df.loc[: , [True, False, True, False]]
# Example 2 : Select column based on condition
col = (df == 1200).any()
df = df.loc[: , col]
# Example 3: Select columns based on multiple condition
col = ((df == 25000 ) & (df =='Pandas' )).any()
d1f = df.loc[: , col]
# Example 4: Get specified column
# Using df[] notation
# Along with specified condition
print(df[df.columns[df.iloc[0] == '30days']])
Create Pandas DataFrame
To run some examples of select pandas columns based on condition, let’s create a Pandas DataFrame using data from a dictionary.
# Create Pandas DataFrame
import pandas as pd
technologies = {
'Courses':["Spark","PySpark", "Pandas", "Python"],
'Fee' :[20000, 25000, 22000, 24000],
'Duration':['30days','40days', '35days', '45days'],
'Discount':[1000, 2300, 1200, 1500]
}
df = pd.DataFrame(technologies)
print("Create DataFrame:\n",df)
Yields below output.
Select Pandas Columns Based on Boolean Value
Using df.loc[]
we can get the rows or columns of DataFrame based on the index. Here, I will pass boolean values in the column section of loc[]
attribute where boolean values should be taken the same size of columns of given DataFrame. This syntax will return corresponding columns of DataFrame for every boolean value of True
.
# Pass boolean value into loc[] & get specified column
df1 = df.loc[: , [True, False, True, False]]
print(df1)
# Output:
# Courses Duration
# 0 Spark 30days
# 1 PySpark 40days
# 2 Pandas 35days
# 3 Python 45days
Select Pandas Columns Based on Single Conditions
We can get specified column/columns of a given Pandas DataFrame based on condition along with any()
function and loc[]
attribute. First, select a column using df==1200
condition, it will return the same sized DataFrame where elements are boolean values. If the value is True
for the corresponding value of 1200 of the original DataFrame, otherwise it False
.
Then, call any()
function with Boolean DataFrame, and it will return the boolean Series where the values are all columns of DataFrame.
Finally, Pass the above boolean Series into the column section of the df.loc[]
attribute, it will return the specified column of the given DataFrame.
# Select column based on condition
col = (df == 1200).any()
df = df.loc[: , col]
print(df)
# Output:
# Discount
# 0 1000
# 1 2300
# 2 1200
# 3 1500
Select Pandas Columns Based on Multiple Conditions
Alternatively, Using the above syntax we can get specified columns based on multiple conditions. Let’s use and select the columns of the given DataFrame. For example,
# Select columns based on multiple condition
col = ((df == 25000 ) & (df =='Pandas' )).any()
d1f = df.loc[: , col]
print(df1)
# Output:
# Courses Duration
# 0 Spark 30days
# 1 PySpark 40days
# 2 Pandas 35days
# 3 Python 45days
Using df[] & Get Specified Column Based on Condition
When we want to select only columns based on some condition of a DataFrame, we can go with df[]
notation, the best way to select the specified columns of DataFrame. df[df.columns[df.iloc[0]=='30days']]
using this syntax we can select the specified columns based on condition.
# Get specified column using df[] notation
# Along with specified condition
print(df[df.columns[df.iloc[0] == '30days']])
# Output:
# Duration
# 0 30days
# 1 40days
# 2 35days
# 3 45days
Frequently Asked Questions on Select Pandas Columns Based on Condition
To select columns in a Pandas DataFrame based on a specific condition, you can use boolean indexing. For example, the condition is to select columns where the sum of values in each column is greater than 10. The df.sum() > 10
creates a boolean mask, and df.columns[...]
selects the column names that satisfy the condition. Finally, these selected columns are used to create a new DataFrame (result_df
).
You can select columns based on a condition involving multiple columns using boolean indexing with logical operators.
If you want to select columns based on a condition for the entire column (not just specific rows), you can use boolean indexing directly on the DataFrame.
You can select columns based on a condition involving string values in a Pandas DataFrame. You can use string methods or conditions to filter columns based on string content
It is possible to select columns based on a condition and rename them in a Pandas DataFrame. After selecting the columns, you can use the rename
method to rename the columns as needed.
Conclusion
In this article, you have learned to select columns based on single/multiple conditions using pandas loc[]
, iloc[]
attributes, and df[]
notation, with multiple examples
Related Articles
- Pandas apply() Return Multiple Columns
- Pandas Concatenate Two Columns
- How to Get Pandas Columns Count
- Pandas Split Column into Two Columns
- Pandas convert column to string-type
- Sort Multiple Columns in Pandas DataFrame
- Split Pandas DataFrame by column value
- Pandas Count Unique Values in Column
- Pandas GroupBy Multiple Columns Explained
- Pandas Drop Multiple Columns From DataFrame
- How to Add Multiple Columns to Pandas DataFrame
- Pandas Convert Row to Column Header in DataFrame