Apply Multiple Filters to Pandas DataFrame or Series

  • Post author:
  • Post category:Pandas
  • Post last modified:January 23, 2024
  • Reading time:15 mins read

By using df[], loc[], query() and isin() we can apply multiple filters for retrieving data efficiently from the pandas DataFrame or Series. The process of applying multiple filters in pandas DataFrame is one of the most frequently performed tasks while manipulating data. Pandas provide several techniques to retrieve subsets of data from your DataFrame efficiently.

In this article, I will explain how to apply multiple filters to filter the rows from DataFrame or Series using df[], loc[], query(), and isin() functions.

Key Points –

  • Use the loc or iloc accessor to filter rows based on specific conditions in a Pandas DataFrame.
  • Apply conditional filtering to a DataFrame or Series using the query method for concise and expressive syntax.
  • Utilize the isin method to filter rows based on whether values are present in a specified list or array.
  • Apply filtering based on the existence of null or non-null values using the isnull and notnull methods.
  • Combine multiple filter conditions using parentheses for complex filtering logic.
  • Combine multiple filtering conditions using logical operators such as & (and), | (or), and ~ (not).

Quick Examples of Apply Multiple Filters to DataFrame or Series

If you are in a hurry, below are some quick examples of how to apply multiple filters to DataFrame or Series.


# Quick examples of apply multiple filters 

# Example 1: Apply multiple filters using df[] function
df2 = df[(df['Fee'] >= 24000) & (df['Fee'] = 25000, 'Fee']

# Example 2: Use df[] function to get Fee values 
# greater than or equal to 25000
df2 = df[df['Fee'] >= 25000]

# Example 3: Apply multiple filters 
# using DataFrame.query() function
df2 = df.query('Fee <= 24000 & 24000 <= Fee')

# Example 4: Apply multiple filters 
# using DataFrame.loc[] function
df2 = df.loc[df['Discount'].ge(2300),'Discount']

# Example 5: Get Discount values between 2000 and 2400
df2 = df.loc[df['Discount'].between(2000,2400)]

# Example 6: Get values from Discount 
# greater than or equal to 2300
df2 = df.loc[df['Discount'].gt(2300)]

# Example 7: Apply multiple filters by list of values
list =["PySpark","Hadoop","Hyperion"]
df2 = df[df.Courses.isin(list)]

# Example 8: Apply multiple filter rows based on list of values
list =["PySpark","Hadoop","Hyperion"]
df2 = df[df["Courses"].isin(list)]

Create Pandas DataFrame from Python dictionary in which keys are 'Courses', 'Fee', 'Duration' and 'Discount‘, and values are taken as a list of corresponding key values.


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","Hyperion"],
    'Fee' :[20000,25000,26000,24000,30000],
    'Duration':['30days','40days','35days','60days','40days'],
    'Discount':[1000,2300,2500,2000,3000]
              }
df = pd.DataFrame(technologies)
print(df)

Yields below output.


# Output:
    Courses    Fee Duration  Discount
0     Spark  20000   30days      1000
1   PySpark  25000   40days      2300
2    Hadoop  26000   35days      2500
3    Python  24000   60days      2000
4  Hyperion  30000   40days      3000

Apply Multiple Filters to Pandas DataFrame

Most of the time we would need to filter the rows based on multiple conditions applying on multiple columns in pandas DataFrame. When applying multiple conditions you need aware of a few things. For example, parentheses are needed for each condition expression due to Python’s operator precedence rules& operator binds more tightly than <= and >=. not using parenthesis will have unexpected results.


# Apply multiple filters 
# Using df[] function
df2 = df[(df['Fee'] >= 24000) & (df['Fee'] <=26000)]
print(df2)

# Output:
   Courses    Fee Duration  Discount
1  PySpark  25000   40days      2300
2   Hadoop  26000   35days      2500
3   Python  24000   60days      2000

Let’s apply filters using DataFrame.loc[] function.


# Use df.loc[] function 
# To apply multiple filters
df2 = df.loc[df['Fee'] >= 25000, 'Fee']
print(df2)

# Output:
# 1    25000
# 2    26000
# 4    30000
# Name: Fee, dtype: int64

Using this syntax we can get a new DataFrame with select rows.


# Use df[] function to get Fee values 
# Greater than or equal to 25000
df2 = df[df['Fee'] >= 25000]
print(df2)

# Output:
    Courses    Fee Duration  Discount
1   PySpark  25000   40days      2300
2    Hadoop  26000   35days      2500
4  Hyperion  30000   40days      3000

Apply Multiple Filters Using DataFrame.query() Function

DataFrame.query() function is recommended way to filter rows and you can chain these operators to apply multiple conditions, For example, df2=df.query('Fee<= 24000 & 24000 <= Fee').


# Apply multiple filters 
# Using DataFrame.query() function
df2 = df.query('Fee <= 24000 & 24000 <= Fee')
print(df2)

# Output:
#  Courses    Fee Duration  Discount
# 3  Python  24000   60days      2000

Filters by List of Multiple Values

If you have values in a list and wanted to filter the DataFrame with these values, use isin() function.


# Apply filters by list of values
list =["PySpark","Hadoop","Hyperion"]
df2 = df[df.Courses.isin(list)]
print(df2)

# Apply filter rows based on list of values
list =["PySpark","Hadoop","Hyperion"]
df2 = df[df["Courses"].isin(list)]
print(df2)

Yields below output.


# Output:
    Courses    Fee Duration  Discount
1   PySpark  25000   40days      2300
2    Hadoop  26000   35days      2500
4  Hyperion  30000   40days      3000

Complete Example Apply Multiple Filters to DataFrame or Series


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","Hyperion"],
    'Fee' :[20000,25000,26000,24000,30000],
    'Duration':['30days','40days','35days','60days','40days'],
    'Discount':[1000,2300,2500,2000,3000]
              }
df = pd.DataFrame(technologies)
print(df)

# Apply multiple filters using df[] function
df2 = df[(df['Fee'] >= 24000) & (df['Fee'] = 25000, 'Fee']
print(df2)

# Use df[] function to get Fee values 
# greater than or equal to 25000
df2 = df[df['Fee'] >= 25000]
print(df2)

# Apply multiple filters 
# Using DataFrame.query() function
df2 = df.query('Fee <= 24000 & 24000 <= Fee')
print(df2)

# Apply filters using DataFrame.loc[] function
df2 = df.loc[df['Discount'].ge(2300),'Discount']
print(df2)

# Get Discount values between 2000 and 2400
df2 = df.loc[df['Discount'].between(2000,2400)]
print(df2)

# Get values from Discount greater than or equal to 2300
df2 = df.loc[df['Discount'].gt(2300)]
print(df2)

# Apply filters by list of values
list =["PySpark","Hadoop","Hyperion"]
df2 = df[df.Courses.isin(list)]
print(df2)

# Apply filter rows based on list of values
list =["PySpark","Hadoop","Hyperion"]
df2 = df[df["Courses"].isin(list)]
print(df2)

Frequently Asked Questions

What is the recommended way to apply multiple filters to a Pandas DataFrame?

The recommended ways include using the df.loc[] function with logical operators or the DataFrame.query() method. Both methods provide flexibility in expressing complex filtering conditions.

How do I filter rows based on multiple conditions using df.loc[]?

To filter rows based on multiple conditions using df.loc[], you can use logical operators such as & (AND) and | (OR) to combine conditions.

Can I use the DataFrame.query() function to apply multiple filters?

You can use the DataFrame.query() function to apply multiple filters to a Pandas DataFrame. The query() method allows you to write filtering conditions in a more concise and expressive manner using a query string.

How do I filter rows based on multiple values in different columns?

To filter rows based on multiple values in different columns, you can use logical operators (& for AND, | for OR) to combine conditions involving those columns.

Can I filter based on ranges of values using df.loc[] or DataFrame.query()?

You can filter based on ranges of values using both df.loc[] and DataFrame.query() methods in Pandas.

How do I combine multiple filter conditions with different logical operators?

You can combine multiple filter conditions with different logical operators using parentheses for grouping and logical operators such as & (AND) and | (OR).

Conclusion

In this article, I have explained the efficient way to apply multiple filters to pandas DataFrame or Series by using df[], DataFrame.loc[], DataFrame.query(), and isin() function with several examples.

Happy Learning !!

Naveen (NNK)

Naveen (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ @ LinkedIn

Leave a Reply