Pandas Filter Rows by Conditions

You can filter the rows from Pandas DataFrame based on a single condition or multiple conditions using either loc[], query(), or apply() function. In this article, I will explain filter rows by condition(s) with several examples.

Quick Examples of Filter Rows by Condition(s)

Following are quick examples of filtering DataFrame rows by single conditions and multiple conditions.


# Quick examples of filter rows by condition(s)

# Filter rows based on condition
df[df["Courses"] == 'Spark'] 
df.loc[df['Courses'] == value]
df.query("Courses == 'Spark'")
df.loc[df['Courses'] != 'Spark']
df.loc[df['Courses'].isin(values)]
df.loc[~df['Courses'].isin(values)]

# Filter multiple Conditions 
# Using multiple columns
df.loc[(df['Discount'] >= 1000) & (df['Discount'] <= 2000)]
df.loc[(df['Discount'] >= 1200) & (df['Fee'] >= 23000 )]

# Using lambda function
df.apply(lambda row: row[df['Courses'].isin(['Spark','PySpark'])])

# Filter columns that have no None & nana values
df.dropna()

# Other examples
df[df['Courses'].str.contains("Spark")]
df[df['Courses'].str.lower().str.contains("spark")]
df[df['Courses'].str.startswith("P")]

If you are a learner, let’s see with sample data and then run through these examples, and explore the output to understand better. First, let’s create a pandas DataFrame from Dictionary.


import pandas as pd
import numpy as np

technologies= {
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Duration':['30days','50days','30days', None,np.nan],
    'Discount':[1000,2300,1000,1200,2500]
          }
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)

Yields below output.

Note that the above DataFrame also contains None and NaN values on Duration column that I would be using in my examples below to filter rows that have None & Nan values and filter.

Filter Rows by Condition

You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows.


# Filter Rows by Condition
df2=df[df["Courses"] == 'Spark'] 
print("After filtering the rows:\n", df2)

Yields below output.

You can also write the above statement with a variable.


value="Spark"
df2=df[df["Courses"] == value]

If you want to filter based on value not equals then use != operator. df[df["Courses"] != 'Spark']


df[df["Courses"] != 'Spark']

Using DataFrame.query()

Using query() method you can filter pandas DataFrame rows using an expression, below is a simple example. You can use query() pretty much to run any example explained in this article.


df2=df.query("Courses == 'Spark'")
print(df2)

Yields the same output as above. You can also try other examples explained above with this approach.

Using DataFrame.loc[] to Filter by Condition

By using DataFrame.loc[].


# Using DataFrame.loc to Filter by Condition
df2=df.loc[df['Courses'] == "Spark"]
print(df2)

Yields the same output as above. You can also try other examples explained above with this approach.

Filter Rows by List of Values

If you have values in a list and want to filter the rows based on the list of values use isin() method.


# Filter Rows Based on List of Values
values=["Spark","PySpark"]
print(df[df["Courses"].isin(values)] )
# Using df.loc
print(df.loc[df['Courses'].isin(values)])

Yields below output.


# Output:
   Courses    Fee Duration  Discount
0    Spark  22000   30days      1000
1  PySpark  25000   50days      2300

Filter rows by negating conditions can be done using ~ operator


df2=df.loc[~df['Courses'].isin(values)]
print(df2)

Pandas Filter Rows by Multiple Conditions

Most of the time we would need to filter the rows based on multiple conditions applying on multiple columns, you can do that in Pandas as below. Note that the parentheses are needed for each condition expression due to Python’s operator precedence rules. & operator binds more tightly than <= and >=. not using parenthesis will have unexpected results.


# Filter Rows based on multiple conditions  
print(df.loc[(df['Discount'] >= 1000) & (df['Discount'] <= 2000)])
print(df[(df['Discount'] >= 1000) & (df['Discount'] <= 2000)])
print(df.query("Discount >= 1000 & Discount <= 2000"))

Yields below output.


# Output:
  Courses    Fee Duration  Discount
0   Spark  22000   30days      1000
2  Hadoop  23000   30days      1000
3  Python  24000     None      1200

You can also use multiple columns to filter Pandas DataFrame rows.


df2=df.loc[(df['Discount'] >= 1200) & (df['Fee'] >= 23000 )]
print(df2)

Yields below output


# Output:
   Courses    Fee Duration  Discount
1  PySpark  25000   50days      2300
3   Python  24000     None      1200
4   Pandas  26000      NaN      2500

Using DataFrame.apply() & Lambda Function

The DataFrame.apply() method is used to apply the lambda function row-wise, returning the rows that satisfy specified conditions.


# By using lambda function
print(df.apply(lambda row: row[df['Courses'].isin(['Spark','PySpark'])]))

Yields below output.


# Output:
   Courses    Fee Duration  Discount
0    Spark  22000   30days      1000
1  PySpark  25000   50days      2300

Use None & NaN Values

In case, if you want to drop rows that contain None & NaN values, you can use the DataFrame.dropna() method.


# Filter rows by ignoreing columns 
# that have None & Nan values
print(df.dropna())

Yields below output


# Output:
   Courses    Fee Duration  Discount
0    Spark  22000   30days      1000
1  PySpark  25000   50days      2300
2   Hadoop  23000   30days      1000

The dropna() method can also be used to drop columns that contain None or NaN values by setting the axis parameter to 'columns'. refer to how to drop Pandas DataFrame columns.


print(df.dropna(axis='columns'))

Yields below output.


# Output:
   Courses    Fee  Discount
0    Spark  22000      1000
1  PySpark  25000      2300
2   Hadoop  23000      1000
3   Python  24000      1200
4   Pandas  26000      2500

Using Series


# Filter based on value contains
print(df[df['Courses'].str.contains("Spark")])

# Filter after converting values
print(df[df['Courses'].str.lower().str.contains("spark")])

# Filter startswith
print(df[df['Courses'].str.startswith("P")])

FAQ on Pandas Filter Rows by Conditions

How do I filter rows based on a single condition?

You can filter rows based on a single condition using boolean indexing. For example, to filter rows where the ‘Age’ column is greater than 30

How do I drop rows with None or NaN values?

To drop rows with None or NaN values in a pandas DataFrame, you can use the dropna() method.

How can I filter rows using the “OR” operator for multiple conditions?

Use the | operator for “OR” conditions. For example, df[(df['x'] > 10) | (df['y'] < 5)] will filter rows where ‘x’ is greater than 10 or ‘y’ is less than 5.

How do I filter rows based on conditions that involve None or NaN values?

You can use the apply() method with a lambda function to handle None or NaN values. For example, to filter rows where ‘Age’ is greater than 30 and ‘Gender’ is ‘Male’, considering None and NaN values.

How do I drop columns with None or NaN values?

You can drop columns that contain any None or NaN values by setting the axis parameter to 'columns' in the dropna() method

Conclusion

In this article, you have learned different examples of how to filter Pandas Rows based on single and multiple conditions. Remember that when you filter DataFrame Rows, it always returns a new DataFrame with selected rows.

Happy Learning !!

References

https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html

Quick Examples of Filter Rows by Condition(s)

Filter Rows by Condition

Using DataFrame.query()

Using DataFrame.loc[] to Filter by Condition

Filter Rows by List of Values

Pandas Filter Rows by Multiple Conditions

Using DataFrame.apply() & Lambda Function

Use None & NaN Values

Using Series

FAQ on Pandas Filter Rows by Conditions

Conclusion

Related Articles

References