By using df[]
, loc[]
, query()
and isin()
we can apply multiple filters for retrieving data efficiently from the pandas DataFrame or Series. Applying multiple filters in a Pandas DataFrame is among the most commonly executed tasks in data manipulation. Pandas offers several techniques for efficiently retrieving subsets of data from a DataFrame.
In this article, I will explain how to apply multiple filters to filter the rows from DataFrame or Series using df[]
, loc[]
, query()
, and isin()
functions.
Key Points –
- Use the
loc
oriloc
accessor to filter rows based on specific conditions in a Pandas DataFrame. - Apply conditional filtering to a DataFrame or Series using the
query
method for concise and expressive syntax. - Utilize the
isin
method to filter rows based on whether values are present in a specified list or array. - Apply filtering based on the existence of null or non-null values using the
isnull
andnotnull
methods. - Combine multiple filtering conditions using logical operators such as
&
(and),|
(or), and~
(not).
Quick Examples of Apply Multiple Filters to DataFrame or Series
If you are in a hurry, below are some quick examples of how to apply multiple filters to DataFrame or Series.
# Quick examples of apply multiple filters
# Example 1: Apply multiple filters using df[] function
df2 = df[(df['Fee'] >= 24000) & (df['Fee'] = 25000, 'Fee']
# Example 2: Use df[] function to get Fee values
# greater than or equal to 25000
df2 = df[df['Fee'] >= 25000]
# Example 3: Apply multiple filters
# using DataFrame.query() function
df2 = df.query('Fee <= 24000 & 24000 <= Fee')
# Example 4: Apply multiple filters
# using DataFrame.loc[] function
df2 = df.loc[df['Discount'].ge(2300),'Discount']
# Example 5: Get Discount values between 2000 and 2400
df2 = df.loc[df['Discount'].between(2000,2400)]
# Example 6: Get values from Discount
# greater than or equal to 2300
df2 = df.loc[df['Discount'].gt(2300)]
# Example 7: Apply multiple filters by list of values
list =["PySpark","Hadoop","Hyperion"]
df2 = df[df.Courses.isin(list)]
# Example 8: Apply multiple filter rows based on list of values
list =["PySpark","Hadoop","Hyperion"]
df2 = df[df["Courses"].isin(list)]
To run some examples of applying multiple filters to pandas DataFrame or Series, let’s create Pandas DataFrame from the Python dictionary.
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Hadoop","Python","Hyperion"],
'Fee' :[20000,25000,26000,24000,30000],
'Duration':['30days','40days','35days','60days','40days'],
'Discount':[1000,2300,2500,2000,3000]
}
df = pd.DataFrame(technologies)
print(df)
Yields below output.
# Output:
Courses Fee Duration Discount
0 Spark 20000 30days 1000
1 PySpark 25000 40days 2300
2 Hadoop 26000 35days 2500
3 Python 24000 60days 2000
4 Hyperion 30000 40days 3000
Apply Multiple Filters to Pandas DataFrame
Most of the time we would need to filter the rows based on multiple conditions applying on multiple columns in pandas DataFrame. When applying multiple conditions you need aware of a few things. For instance, parentheses are needed for each condition expression due to Python’s operator precedence rules. &
operator binds more tightly than <=
and >=
. not using parenthesis will have unexpected results.
# Apply multiple filters
# Using df[] function
df2 = df[(df['Fee'] >= 24000) & (df['Fee'] <=26000)]
print(df2)
# Output:
Courses Fee Duration Discount
1 PySpark 25000 40days 2300
2 Hadoop 26000 35days 2500
3 Python 24000 60days 2000
Let’s apply filters using DataFrame.loc[]
function.
# Use df.loc[] function
# To apply multiple filters
df2 = df.loc[df['Fee'] >= 25000, 'Fee']
print(df2)
# Output:
# 1 25000
# 2 26000
# 4 30000
# Name: Fee, dtype: int64
Using this syntax we can get a new DataFrame with select rows.
# Use df[] function to get Fee values
# Greater than or equal to 25000
df2 = df[df['Fee'] >= 25000]
print(df2)
# Output:
Courses Fee Duration Discount
1 PySpark 25000 40days 2300
2 Hadoop 26000 35days 2500
4 Hyperion 30000 40days 3000
Apply Multiple Filters Using DataFrame.query() Function
DataFrame.query() function is recommended way to filter rows and you can chain these operators to apply multiple conditions, For example, df2=df.query('Fee<= 24000 & 24000 <= Fee')
.
# Apply multiple filters
# Using DataFrame.query() function
df2 = df.query('Fee <= 24000 & 24000 <= Fee')
print(df2)
# Output:
# Courses Fee Duration Discount
# 3 Python 24000 60days 2000
Filters by List of Multiple Values
If you have values in a list and wanted to filter the DataFrame with these values, use isin()
function.
# Apply filters by list of values
list =["PySpark","Hadoop","Hyperion"]
df2 = df[df.Courses.isin(list)]
print(df2)
# Apply filter rows based on list of values
list =["PySpark","Hadoop","Hyperion"]
df2 = df[df["Courses"].isin(list)]
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
1 PySpark 25000 40days 2300
2 Hadoop 26000 35days 2500
4 Hyperion 30000 40days 3000
Complete Example For
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Hadoop","Python","Hyperion"],
'Fee' :[20000,25000,26000,24000,30000],
'Duration':['30days','40days','35days','60days','40days'],
'Discount':[1000,2300,2500,2000,3000]
}
df = pd.DataFrame(technologies)
print(df)
# Apply multiple filters using df[] function
df2 = df[(df['Fee'] >= 24000) & (df['Fee'] = 25000, 'Fee']
print(df2)
# Use df[] function to get Fee values
# greater than or equal to 25000
df2 = df[df['Fee'] >= 25000]
print(df2)
# Apply multiple filters
# Using DataFrame.query() function
df2 = df.query('Fee <= 24000 & 24000 <= Fee')
print(df2)
# Apply filters using DataFrame.loc[] function
df2 = df.loc[df['Discount'].ge(2300),'Discount']
print(df2)
# Get Discount values between 2000 and 2400
df2 = df.loc[df['Discount'].between(2000,2400)]
print(df2)
# Get values from Discount greater than or equal to 2300
df2 = df.loc[df['Discount'].gt(2300)]
print(df2)
# Apply filters by list of values
list =["PySpark","Hadoop","Hyperion"]
df2 = df[df.Courses.isin(list)]
print(df2)
# Apply filter rows based on list of values
list =["PySpark","Hadoop","Hyperion"]
df2 = df[df["Courses"].isin(list)]
print(df2)
FAQ on Apply Multiple Filters to DataFrame or Series
The recommended ways include using the df.loc[]
function with logical operators or the DataFrame.query()
method. Both methods provide flexibility in expressing complex filtering conditions.
To filter rows based on multiple conditions using df.loc[]
, you can use logical operators such as &
(AND) and |
(OR) to combine conditions.
To filter rows based on multiple values in different columns, you can use logical operators (&
for AND, |
for OR) to combine conditions involving those columns.
You can filter based on ranges of values using both df.loc[]
and DataFrame.query()
methods in Pandas.
You can combine multiple filter conditions with different logical operators using parentheses for grouping and logical operators such as &
(AND) and |
(OR).
Conclusion
In this article, you have learned the efficient way to apply multiple filters to pandas DataFrame or Series by using df[]
, DataFrame.loc[]
, DataFrame.query()
, and isin()
function with examples.
Happy Learning !!
Related Articles
- Pandas Series filter() Function
- Pandas filter by column value
- How to Use NOT IN Filter in Pandas
- Pandas Filter DataFrame Rows on Dates
- Pandas Filter Rows Using IN Like SQL
- Pandas Filter DataFrame by Multiple Conditions
- pandas DataFrame filter() – Usage & Examples
- Pandas Filter DataFrame by Substring criteria
- Pandas Operator Chaining to Filter DataFrame Rows
- Pandas Filter Rows with NAN Value from DataFrame Column