• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:17 mins read
You are currently viewing Pandas Filter DataFrame by Multiple Conditions

How to Filter Pandas DataFrame by multiple conditions? By using df[], loc[], query(), eval() and numpy.where() we can filter Pandas DataFrame by multiple conditions. The process of applying multiple filter conditions in Pandas DataFrame is one of the most frequently performed tasks while manipulating data. Pandas provide several techniques to retrieve subsets of data from the DataFrame efficiently.

In this article, I will explain how to filter pandas DataFrame multiple conditions using DataFrame.loc[], DataFrame.query(), df[], DataFrame.eval(), and numpy.where() functions.

1. Quick Examples of Filter DataFrame Multiple Conditions

If you are in a hurry, below are some quick examples of how to filter DataFrame by multiple conditions.


# Quick examples of filter dataframe multiple conditions

# Example 1: Use DataFrame.loc[] 
# To filter by multiple conditions
df2 = df.loc[(df['Fee']>=24000) & 
             (df['Discount']< 2000) & 
             (df['Courses'].str.startswith('P')),
                    ['Courses','Duration']]

# Example 2: Use DataFrame.query() 
# To filter by multiple conditions
df2 = df.query('Fee  >= 22000 & 
                Discount < 3000 & 
                Courses.str.startswith("H").values')

# Example 3: Use pandas boolean indexing df[] 
# To filter by multiple conditions
df2 = df[(df['Fee']>=22000) & 
         (df['Discount']< 3000) & 
         df['Courses'].str.startswith('P')][['Courses','Fee', 'Discount']]

# Example 4: Use DataFrame.eval() 
# To filter by multiple conditions
df2 =df[df.eval("Fee <= 25000 & 
                (Discount < 2300) & 
                Courses.str.startswith('S').values")]

# Example 5: Use numpy.where() 
# To filter by multiple conditions
df2 = np.where((df['Fee']>=22000) & 
               (df['Discount']< 3000) & 
               (df['Courses'].str.startswith('P')))
df3 = df.loc[df2])

Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are CoursesFeeDuration and Discount.


import pandas as pd
import numpy as np
technologies= ({
    'Courses':["Spark","Pyspark","Hadoop","Pandas"],
    'Fee' :[22000,25000,24000,26000],
    'Duration':['30days','50days','40days','60days'],
    'Discount':[1000,2300,2500,1400]
              })
df = pd.DataFrame(technologies)
print(df)

Yields below output.

pandas filter multiple conditions

2. Pandas Filter by Multiple Conditions

To filter Pandas DataFrame by multiple conditions use DataFrame.loc[] property along with the conditions. Make sure you surround each condition with a bracket. Here, we will get all rows having Fee greater or equal to 24000 and Discount is less than 2000 and their Courses start with ‘P’ from the DataFrame. Print the details with Courses and their Duration.


# Use DataFrame.loc[] to filter by multiple conditions
df2 = df.loc[(df['Fee']>=24000) & (df['Discount']< 2000) & (df['Courses'].str.startswith('P')),
                    ['Courses','Duration']]
print(df2)

Yields below output.


# Output:
  Courses Duration
3  Pandas   60days

3. Use DataFrame.query() to Filter by Multiple Conditions

DataFrame.query() function can also be used to filter by multiple conditions in pandas. In the below example, we have specified some conditions, such as the Fee>=22000 and Discount<3000, and the first letter in the column Courses must start with H. After applying the expression, it returns a new DataFrame.


# Use DataFrame.query() to filter by multiple conditions
df2 = df.query('Fee  >= 22000 & Discount < 3000 & Courses.str.startswith("H").values')
print(df2)

Yields below output.


# Output:
  Courses    Fee Duration  Discount
2  Hadoop  24000   40days      2500

4. Filter by Multiple Conditions using df[] Notation

Similarly, we get all rows having Fee>=22000 and Discount <3000, and their Courses start with ‘P’ from the DataFrame. In order to select the subset of data using the values in the DataFrame and applying Boolean conditions.


# Use Pandas Boolean Indexing df[] to Filter by Multiple Conditions
df2 = df[(df['Fee']>=22000) & (df['Discount']< 3000) & df['Courses'].str.startswith('P')][['Courses','Fee', 'Discount']]
print(df2)

Yields below output.


# Output:
   Courses    Fee  Discount
1  Pyspark  25000      2300
3   Pandas  26000      1400

5. Use DataFrame.eval() to Filter by Multiple Conditions

The eval() function is used to evaluate a string describing operations on DataFrame columns which can be used to filter Pandas DataFrame by multiple conditions. It operates on columns only, not specific rows or elements. Inside the parentheses of the eval() function, we have specified two conditions with AND operators between them.

This function has extracted one row that satisfies the condition. Here, we get all rows having Fee<=25000 and Discount<2300, and their Courses start with ‘S’ from the DataFrame. Similar to the query() function.


# Use DataFrame.eval() to filter by multiple conditions
df2 =df[df.eval("Fee <= 25000 & (Discount < 2300) & Courses.str.startswith('S').values")]
print(df2)

Yields below output.


# Output:
  Courses    Fee Duration  Discount
0   Spark  22000   30days      1000

6. Use NumPy.where() to Filter by Multiple Conditions

Alternatively, we can also use numpy.where() function to filter pandas DataFrame by specified multiple conditions. we will get all rows having Fee greater or equal to 22000 and Discount is less than 3000, and the first character of the column Courses must start with the letter P.

The np.where() function has returned an array with two index values, 1 and 3. We can use the loc[] function to retrieve the rows.


# Use numpy.where() to filter by multiple conditions
df2 = np.where((df['Fee']>=22000) & (df['Discount']< 3000) & (df['Courses'].str.startswith('P')))
print(df2)
df3 = df.loc[df2])
print(df3)

Yields below output.


# Output:
(array([1, 3], dtype=int64),)
   Courses    Fee Duration  Discount
1  Pyspark  25000   50days      2300
3   Pandas  26000   60days      1400

7. Complete Example Filter DataFrame by Multiple Conditions


import pandas as pd
import numpy as np
technologies= ({
    'Courses':["Spark","Pyspark","Hadoop","Pandas"],
    'Fee' :[22000,25000,24000,26000],
    'Duration':['30days','50days','40days','60days'],
    'Discount':[1000,2300,2500,1400]
              })
df = pd.DataFrame(technologies)
print(df)

# Use DataFrame.loc[] to filter by multiple conditions
df2 = df.loc[(df['Fee']>=24000) & (df['Discount']< 2000) & (df['Courses'].str.startswith('P')),
                    ['Courses','Duration']]
print(df2)

# Use DataFrame.query() to filter by multiple conditions
df2 = df.query('Fee  >= 22000 & Discount < 3000 & Courses.str.startswith("H").values')
print(df2)

# Use Pandas Boolean Indexing df[] to Filter by Multiple Conditions
df2 = df[(df['Fee']>=22000) & (df['Discount']< 3000) & df['Courses'].str.startswith('P')][['Courses','Fee', 'Discount']]
print(df2)

# Use DataFrame.eval() to filter by multiple conditions
df2 =df[df.eval("Fee <= 25000 & (Discount < 2300) & Courses.str.startswith('S').values")]
print(df2)

# Use numpy.where() to filter by multiple conditions
df2 = np.where((df['Fee']>=22000) & (df['Discount']< 3000) & (df['Courses'].str.startswith('P')))
print(df2)
df3 = df.loc[df2])
print(df3)

Frequently Asked Questions on Filter DataFrame by Multiple Conditions

How can I filter a DataFrame in Python by multiple conditions using pandas?

Filtering a DataFrame in Python by multiple conditions using pandas is a common operation. You can use boolean indexing or the loc[] method for this purpose.

Can I use the query method for filtering by multiple conditions?

You can use the query() method in pandas for filtering by multiple conditions. The query() method allows you to write the conditions in a more SQL-like syntax.

What if I want to filter based on multiple conditions using OR logic?

If you want to filter a DataFrame based on multiple conditions using OR logic, you can use the | (pipe) symbol to combine the conditions.

Can I filter a DataFrame based on conditions involving multiple columns?

You can filter a DataFrame based on conditions involving multiple columns. You can use logical AND (&) or logical OR (|) operators to combine conditions for different columns.

How can I filter based on a list of values for a specific column?

To filter a DataFrame based on a list of values for a specific column, you can use the isin() method in pandas. For example, the isin() method is used to filter rows where the ‘City’ column’s values are present in the cities_to_filter list. The resulting filtered_df_list_of_values DataFrame will contain only the rows where the ‘City’ column matches any of the specified values in the list.

What if I want to filter based on a condition for one column and another condition for a different column?

If you want to filter a DataFrame based on a condition for one column and another condition for a different column, you can combine these conditions using logical AND (&) or logical OR (|) operators.

Conclusion

In this article, I have explained how to filter pandas DataFrame with multiple conditions by using DataFrame.loc[], DataFrame.query(), df[], DataFrame.eval(), and numpy.where() function with several examples.

Happy Learning !!

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium