Pandas Filter DataFrame by Multiple Conditions

Spread the love

How to Filter Pandas DataFrame by multiple conditions? By using df[], loc[], query(), eval() and numpy.where() we can filter Pandas DataFrame by multiple conditions. The process of applying multiple filter conditions in Pandas DataFrame is one of the most frequently performed tasks while manipulating data. Pandas provide several techniques to retrieve subsets of data from the DataFrame efficiently.

In this article, I will explain how to filter pandas DataFrame multiple conditions using DataFrame.loc[], DataFrame.query(), df[], DataFrame.eval(), and numpy.where() functions.

1. Quick Examples of Filter DataFrame Multiple Conditions

If you are in a hurry, below are some quick examples of how to filter DataFrame by multiple conditions.


# Below are the quick examples

# Example 1: Use DataFrame.loc[] to filter by multiple conditions
df2 = df.loc[(df['Fee']>=24000) & 
             (df['Discount']< 2000) & 
             (df['Courses'].str.startswith('P')),
                    ['Courses','Duration']]

# Example 2: Use DataFrame.query() to filter by multiple conditions
df2 = df.query('Fee  >= 22000 & 
                Discount < 3000 & 
                Courses.str.startswith("H").values')

# Example 3: Use Pandas Boolean Indexing df[] to Filter by Multiple Conditions
df2 = df[(df['Fee']>=22000) & 
         (df['Discount']< 3000) & 
         df['Courses'].str.startswith('P')][['Courses','Fee', 'Discount']]

# Example 4: Use DataFrame.eval() to filter by multiple conditions
df2 =df[df.eval("Fee <= 25000 & 
                (Discount < 2300) & 
                Courses.str.startswith('S').values")]

# Example 5: Use numpy.where() to filter by multiple conditions
df2 = np.where((df['Fee']>=22000) & 
               (df['Discount']< 3000) & 
               (df['Courses'].str.startswith('P')))
df3 = df.loc[df2])

Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are CoursesFeeDuration and Discount.


import pandas as pd
import numpy as np
technologies= ({
    'Courses':["Spark","Pyspark","Hadoop","Pandas"],
    'Fee' :[22000,25000,24000,26000],
    'Duration':['30days','50days','40days','60days'],
    'Discount':[1000,2300,2500,1400]
              })
df = pd.DataFrame(technologies)
print(df)

Yields below output.

pandas filter multiple conditions

2. Pandas Filter by Multiple Conditions

To filter Pandas DataFrame by multiple conditions use DataFrame.loc[] property along with the conditions. Make sure you surround each condition with a bracket. Here, we will get all rows having Fee greater or equal to 24000 and Discount is less than 2000 and their Courses start with ‘P’ from the DataFrame. Print the details with Courses and their Duration.


# Use DataFrame.loc[] to filter by multiple conditions
df2 = df.loc[(df['Fee']>=24000) & (df['Discount']< 2000) & (df['Courses'].str.startswith('P')),
                    ['Courses','Duration']]
print(df2)

Yields below output.


# Output
  Courses Duration
3  Pandas   60days

3. Use DataFrame.query() to Filter by Multiple Conditions

DataFrame.query() function can also be used to filter by multiple conditions in pandas. In the below example, we have specified some conditions, such as the Fee>=22000 and Discount<3000, and the first letter in the column Courses must start with H. After applying the expression, it returns a new DataFrame.


# Use DataFrame.query() to filter by multiple conditions
df2 = df.query('Fee  >= 22000 & Discount < 3000 & Courses.str.startswith("H").values')
print(df2)

Yields below output.


  Courses    Fee Duration  Discount
2  Hadoop  24000   40days      2500

4. Filter by Multiple Conditions using df[] Notation

Similarly, we get all rows having Fee>=22000 and Discount <3000, and their Courses start with ‘P’ from the DataFrame. In order to select the subset of data using the values in the DataFrame and applying Boolean conditions.


# Use Pandas Boolean Indexing df[] to Filter by Multiple Conditions
df2 = df[(df['Fee']>=22000) & (df['Discount']< 3000) & df['Courses'].str.startswith('P')][['Courses','Fee', 'Discount']]
print(df2)

Yields below output.


# Output
   Courses    Fee  Discount
1  Pyspark  25000      2300
3   Pandas  26000      1400

5. Use DataFrame.eval() to Filter by Multiple Conditions

The eval() function is used to evaluate a string describing operations on DataFrame columns which can be used to filter Pandas DataFrame by multiple conditions. It operates on columns only, not specific rows or elements. Inside the parentheses of the eval() function, we have specified two conditions with AND operators between them.

This function has extracted one row that satisfies the condition. Here, we get all rows having Fee<=25000 and Discount<2300, and their Courses start with ‘S’ from the DataFrame. Similar to the query() function.


# Use DataFrame.eval() to filter by multiple conditions
df2 =df[df.eval("Fee <= 25000 & (Discount < 2300) & Courses.str.startswith('S').values")]
print(df2)

Yields below output.


# Output
  Courses    Fee Duration  Discount
0   Spark  22000   30days      1000

6. Use NumPy.where() to Filter by Multiple Conditions

Alternatively, we can also use numpy.where() function to filter pandas DataFrame by specified multiple conditions. we will get all rows having Fee greater or equal to 22000 and Discount is less than 3000, and the first character of the column Courses must start with the letter P.

The np.where() function has returned an array with two index values, 1 and 3. We can use the loc[] function to retrieve the rows.


# Use numpy.where() to filter by multiple conditions
df2 = np.where((df['Fee']>=22000) & (df['Discount']< 3000) & (df['Courses'].str.startswith('P')))
print(df2)
df3 = df.loc[df2])
print(df3)

Yields below output.


# Output
(array([1, 3], dtype=int64),)
   Courses    Fee Duration  Discount
1  Pyspark  25000   50days      2300
3   Pandas  26000   60days      1400

7. Complete Example Filter DataFrame by Multiple Conditions


import pandas as pd
import numpy as np
technologies= ({
    'Courses':["Spark","Pyspark","Hadoop","Pandas"],
    'Fee' :[22000,25000,24000,26000],
    'Duration':['30days','50days','40days','60days'],
    'Discount':[1000,2300,2500,1400]
              })
df = pd.DataFrame(technologies)
print(df)

# Use DataFrame.loc[] to filter by multiple conditions
df2 = df.loc[(df['Fee']>=24000) & (df['Discount']< 2000) & (df['Courses'].str.startswith('P')),
                    ['Courses','Duration']]
print(df2)

# Use DataFrame.query() to filter by multiple conditions
df2 = df.query('Fee  >= 22000 & Discount < 3000 & Courses.str.startswith("H").values')
print(df2)

# Use Pandas Boolean Indexing df[] to Filter by Multiple Conditions
df2 = df[(df['Fee']>=22000) & (df['Discount']< 3000) & df['Courses'].str.startswith('P')][['Courses','Fee', 'Discount']]
print(df2)

# Use DataFrame.eval() to filter by multiple conditions
df2 =df[df.eval("Fee <= 25000 & (Discount < 2300) & Courses.str.startswith('S').values")]
print(df2)

# Use numpy.where() to filter by multiple conditions
df2 = np.where((df['Fee']>=22000) & (df['Discount']< 3000) & (df['Courses'].str.startswith('P')))
print(df2)
df3 = df.loc[df2])
print(df3)

8. Conclusion

In this article, I have explained how to filter pandas DataFrame with multiple conditions by using DataFrame.loc[], DataFrame.query(), df[], DataFrame.eval(), and numpy.where() function with several examples.

Happy Learning !!

References

Leave a Reply

You are currently viewing Pandas Filter DataFrame by Multiple Conditions