How to Filter Pandas DataFrame by multiple conditions? By using df[],
loc[]
, query()
, eval()
and numpy.where()
we can filter Pandas DataFrame by multiple conditions. The process of applying multiple filter conditions in Pandas DataFrame is one of the most frequently performed tasks while manipulating data. Pandas provide several techniques to retrieve subsets of data from the DataFrame efficiently.
In this article, I will explain how to filter pandas DataFrame multiple conditions using DataFrame.loc[]
, DataFrame.query()
, df[]
, DataFrame.eval()
, and numpy.where()
functions.
Key Points –
- Use the logical operators (
&
for “and”,|
for “or”) to combine multiple conditions. - Use the
.query()
method for filtering with string-based conditions. - Use comparison operators (
==
,>
,<
,>=
,<=
,!=
) for numeric-based filtering. - Use DataFrame column names directly or with
df['column']
notation for conditional expressions. - Use string methods like
.str.contains()
,.str.startswith()
, etc., for filtering based on string conditions. - Use NumPy functions like
np.where()
to apply complex filtering with multiple conditions.
Quick Examples of Filter DataFrame Multiple Conditions
If you are in a hurry, below are some quick examples of how to filter DataFrame by multiple conditions.
# Quick examples of filter dataframe multiple conditions
# Example 1: Use DataFrame.loc[]
# To filter by multiple conditions
df2 = df.loc[(df['Fee']>=24000) &
(df['Discount']< 2000) &
(df['Courses'].str.startswith('P')),
['Courses','Duration']]
# Example 2: Use DataFrame.query()
# To filter by multiple conditions
df2 = df.query('Fee >= 22000 &
Discount < 3000 &
Courses.str.startswith("H").values')
# Example 3: Use pandas boolean indexing df[]
# To filter by multiple conditions
df2 = df[(df['Fee']>=22000) &
(df['Discount']< 3000) &
df['Courses'].str.startswith('P')][['Courses','Fee', 'Discount']]
# Example 4: Use DataFrame.eval()
# To filter by multiple conditions
df2 =df[df.eval("Fee <= 25000 &
(Discount < 2300) &
Courses.str.startswith('S').values")]
# Example 5: Use numpy.where()
# To filter by multiple conditions
df2 = np.where((df['Fee']>=22000) &
(df['Discount']< 3000) &
(df['Courses'].str.startswith('P')))
df3 = df.loc[df2])
Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are Courses
, Fee
, Duration
and Discount
.
import pandas as pd
import numpy as np
technologies= ({
'Courses':["Spark","Pyspark","Hadoop","Pandas"],
'Fee' :[22000,25000,24000,26000],
'Duration':['30days','50days','40days','60days'],
'Discount':[1000,2300,2500,1400]
})
df = pd.DataFrame(technologies)
print(df)
Yields below output.
Pandas Filter by Multiple Conditions
To filter Pandas DataFrame by multiple conditions use DataFrame.loc[] property along with the conditions. Make sure you surround each condition with a bracket. Here, we will get all rows having Fee
greater or equal to 24000
and Discount
is less than 2000
and their Courses
start with ‘P’
from the DataFrame. Print the details with Courses
and their Duration
.
# Use DataFrame.loc[] to filter by multiple conditions
df2 = df.loc[(df['Fee']>=24000) & (df['Discount']< 2000) & (df['Courses'].str.startswith('P')),
['Courses','Duration']]
print(df2)
Yields below output.
# Output:
Courses Duration
3 Pandas 60days
Use DataFrame.query() to Filter by Multiple Conditions
DataFrame.query() function can also be used to filter by multiple conditions in pandas. In the below example, we have specified some conditions, such as the Fee>=22000
and Discount<3000
, and the first letter in the column Courses must start with H
. After applying the expression, it returns a new DataFrame.
# Use DataFrame.query() to filter by multiple conditions
df2 = df.query('Fee >= 22000 & Discount < 3000 & Courses.str.startswith("H").values')
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
2 Hadoop 24000 40days 2500
Filter by Multiple Conditions using df[] Notation
Similarly, we get all rows having Fee>=22000
and Discount <3000
, and their Courses start with ‘P’
from the DataFrame. In order to select the subset of data using the values in the DataFrame and applying Boolean conditions.
# Use Pandas Boolean Indexing df[] to Filter by Multiple Conditions
df2 = df[(df['Fee']>=22000) & (df['Discount']< 3000) & df['Courses'].str.startswith('P')][['Courses','Fee', 'Discount']]
print(df2)
Yields below output.
# Output:
Courses Fee Discount
1 Pyspark 25000 2300
3 Pandas 26000 1400
Use DataFrame.eval() to Filter by Multiple Conditions
The eval()
function is used to evaluate a string describing operations on DataFrame columns which can be used to filter Pandas DataFrame by multiple conditions. It operates on columns only, not specific rows or elements. Inside the parentheses of the eval()
function, we have specified two conditions with AND operators between them.
This function has extracted one row that satisfies the condition. Here, we get all rows having Fee<=25000
and Discount<2300
, and their Courses
start with ‘S’
from the DataFrame. Similar to the query()
function.
# Use DataFrame.eval() to filter by multiple conditions
df2 =df[df.eval("Fee <= 25000 & (Discount < 2300) & Courses.str.startswith('S').values")]
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
0 Spark 22000 30days 1000
Use NumPy.where() to Filter by Multiple Conditions
Alternatively, we can also use numpy.where() function to filter pandas DataFrame by specified multiple conditions. we will get all rows having Fee
greater or equal to 22000
and Discount
is less than 3000,
and the first character of the column Courses must start with the letter P
.
The np.where()
function has returned an array with two index values, 1
and 3
. We can use the loc[]
function to retrieve the rows.
# Use numpy.where() to filter by multiple conditions
df2 = np.where((df['Fee']>=22000) & (df['Discount']< 3000) & (df['Courses'].str.startswith('P')))
print(df2)
df3 = df.loc[df2])
print(df3)
Yields below output.
# Output:
(array([1, 3], dtype=int64),)
Courses Fee Duration Discount
1 Pyspark 25000 50days 2300
3 Pandas 26000 60days 1400
Complete Example Filter DataFrame by Multiple Conditions
import pandas as pd
import numpy as np
technologies= ({
'Courses':["Spark","Pyspark","Hadoop","Pandas"],
'Fee' :[22000,25000,24000,26000],
'Duration':['30days','50days','40days','60days'],
'Discount':[1000,2300,2500,1400]
})
df = pd.DataFrame(technologies)
print(df)
# Use DataFrame.loc[] to filter by multiple conditions
df2 = df.loc[(df['Fee']>=24000) & (df['Discount']< 2000) & (df['Courses'].str.startswith('P')),
['Courses','Duration']]
print(df2)
# Use DataFrame.query() to filter by multiple conditions
df2 = df.query('Fee >= 22000 & Discount < 3000 & Courses.str.startswith("H").values')
print(df2)
# Use Pandas Boolean Indexing df[] to Filter by Multiple Conditions
df2 = df[(df['Fee']>=22000) & (df['Discount']< 3000) & df['Courses'].str.startswith('P')][['Courses','Fee', 'Discount']]
print(df2)
# Use DataFrame.eval() to filter by multiple conditions
df2 =df[df.eval("Fee <= 25000 & (Discount < 2300) & Courses.str.startswith('S').values")]
print(df2)
# Use numpy.where() to filter by multiple conditions
df2 = np.where((df['Fee']>=22000) & (df['Discount']< 3000) & (df['Courses'].str.startswith('P')))
print(df2)
df3 = df.loc[df2])
print(df3)
Frequently Asked Questions on Filter DataFrame by Multiple Conditions
Filtering a DataFrame in Python by multiple conditions using pandas is a common operation. You can use boolean indexing or the loc[]
method for this purpose.
You can use the query()
method in pandas for filtering by multiple conditions. The query()
method allows you to write the conditions in a more SQL-like syntax.
If you want to filter a DataFrame based on multiple conditions using OR logic, you can use the |
(pipe) symbol to combine the conditions.
You can filter a DataFrame based on conditions involving multiple columns. You can use logical AND (&
) or logical OR (|
) operators to combine conditions for different columns.
To filter a DataFrame based on a list of values for a specific column, you can use the isin()
method in pandas. For example, the isin()
method is used to filter rows where the ‘City’ column’s values are present in the cities_to_filter
list. The resulting filtered_df_list_of_values
DataFrame will contain only the rows where the ‘City’ column matches any of the specified values in the list.
If you want to filter a DataFrame based on a condition for one column and another condition for a different column, you can combine these conditions using logical AND (&
) or logical OR (|
) operators.
Conclusion
In this article, I have explained how to filter pandas DataFrame with multiple conditions by using DataFrame.loc[]
, DataFrame.query()
, df[]
, DataFrame.eval()
, and numpy.where()
function with several examples.
Happy Learning !!
Related Articles
- Pandas Filter by Index
- Pandas filter by column value
- Pandas Series filter() Function
- Pandas Filter Rows by Conditions
- How to Use NOT IN Filter in Pandas
- Pandas Filter DataFrame Rows on Dates
- Pandas Filter Rows Using IN Like SQL
- Pandas Filter DataFrame by Substring criteria
- pandas DataFrame filter() – Usage & Examples
- Apply Multiple Filters to Pandas DataFrame or Series
- Pandas Operator Chaining to Filter DataFrame Rows
- Pandas Filter Rows with NAN Value from DataFrame Column