• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:15 mins read
You are currently viewing How to Use NOT IN Filter in Pandas

Pandas NOT IN (~) operator filter is used to check whether a particular data is available in the DataFrame or not. Pandas don’t have a NOT IN operator, however, you can perform the NOT IN condition by negating DataFrame.isin() result. In this article, I will explain how to filter with a single column or multiple columns, and NumPy using NOT IN (~) operator in pandas DataFrame with examples.

1. Quick Examples of NOT IN Filter

If you are in a hurry, below are some quick examples of how to use NOT IN (~) operator to filter DataFrame.


# Quick examples of nOt in filter in pandas

# Example 1: Filter single column 
# Use NOT IN filter
list_values = ["Hadoop", "Python"]
df2 = df[~df['Courses'].isin(list_values)]

# Example 2: Filter for rows where the Fee name is not in list
list_values = [23000]
df2 = df[~df['Fee'].isin(list_values)]
  
# Example 3: Filter for rows where the Discount name is not in list
list_values = [1000, 2300]
df2 = df[~df['Discount'].isin(list_values)]

# Example 4: Use not in filter with multiple column
list_values = ["Spark", "Pandas", 1000]
df2 = df[~df[['Courses', 'Discount']].isin(list_values).any(axis=1)]
  
# Example 5: Filter in Courses and Duration column
list_values = ["PySpark", '30days']
df2 = df[~df[['Courses', 'Duration']].isin(list_values).any(axis=1)]

# Example 6: Use numpy with not in filter
list_values = ["PySpark", "Python", 22000]
df2 = df[~np.isin(df["Courses"], list_values)]

Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are Courses, Fee, Discount, and Duration.


# Create DataFrame
import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark", "PySpark", "Hadoop", "Python", "Pandas"],
    'Fee' :[22000, 25000, 23000, 24000, 26000],
    'Discount':[1000, 2300, 1000, 1200, 2500],
    'Duration':['35days', '35days', '40days', '30days', '25days']
          }

df = pd.DataFrame(technologies)
print("Create DataFrame:\n",df)

Yields below output.

pandas filter not in

2. Filter Using NOT IN in Pandas

We can use the Pandas unary operator (~) to perform a NOT IN to filter the DataFrame on a single column. We should use isin() operator to get the given values in the DataFrame and use the unary operator ~ to negate the result. In the first example from the following, we are selecting the DataFrame, where Courses not in the list of values.


# Define list of teams we don't want
list_values = ["Hadoop", "Python"]
  
# Filter single column Use NOT IN filter
df2 = df[~df['Courses'].isin(list_values)]
print(df2)
print("------------------------------------")
  
# Filter for rows where the Fee name is not in list
list_values = [23000]
df2 = df[~df['Fee'].isin(list_values)]
print(df2)

Yields below output.


# Output:
   Courses    Fee  Discount Duration
0    Spark  22000      1000   35days
1  PySpark  25000      2300   35days
4   Pandas  26000      2500   25days
------------------------------------
   Courses    Fee  Discount Duration
0    Spark  22000      1000   35days
1  PySpark  25000      2300   35days
3   Python  24000      1200   30days
4   Pandas  26000      2500   25days

3. Use NOT IN Filter with Multiple Columns

We can also use the pandas (~) operator to perform a NOT IN filter on multiple columns or more than one column by using .isin() & any() function. This function will check the value that exists in any given column and the columns are given in [[]] separated by a comma.


# Consider a list
list_values = ["Spark", "Pandas", 1000]
   
# Use not in filter with multiple column
df2 = df[~df[['Courses', 'Discount']].isin(list_values).any(axis=1)]
print(df2)
print("------------------------------------")
  
# Filter in courses and duration column
list_values = ["PySpark", '30days']
df2 = df[~df[['Courses', 'Duration']].isin(list_values).any(axis=1)]
print(df2)

Yields below output.


# Output:
   Courses    Fee  Discount Duration
1  PySpark  25000      2300   35days
3   Python  24000      1200   30days
------------------------------------
  Courses    Fee  Discount Duration
0   Spark  22000      1000   35days
2  Hadoop  23000      1000   40days
4  Pandas  26000      2500   25days

4. Use NumPy with NOT IN Filter

Use numpy with a NOT IN filter, we can also filter the list of column values based on the specified column. It will return the DataFrame without a specified list of values.


# Use numpy with not in filter
list_values = ["PySpark", "Python"]
df2 = df[~np.isin(df["Courses"], list_values)]
print(df2)

Yields below output.


# Output:
  Courses    Fee  Discount Duration
0   Spark  22000      1000   35days
2  Hadoop  23000      1000   40days
4  Pandas  26000      2500   25days

5. Complete Example For NOT IN Filter


import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark", "PySpark", "Hadoop", "Python", "Pandas"],
    'Fee' :[22000, 25000, 23000, 24000, 26000],
    'Discount':[1000, 2300, 1000, 1200, 2500],
    'Duration':['35days', '35days', '40days', '30days', '25days']
              }
df = pd.DataFrame(technologies)
print(df)

# Filter single column Use NOT IN filter
list_values = ["Hadoop", "Python"]
df2 = df[~df['Courses'].isin(list_values)]
print(df2)

# Filter for rows where the Fee name is not in list
list_values = [23000]
df2 = df[~df['Fee'].isin(list_values)]
print(df2)
  
# Filter for rows where the Discount name is not in list
list_values = [1000, 2300]
df2 = df[~df['Discount'].isin(list_values)]
print(df2)

# Use not in filter with multiple column
list_values = ["Spark", "Pandas", 1000]
df2 = df[~df[['Courses', 'Discount']].isin(list_values).any(axis=1)]
print(df2)
  
# Filter in Courses and Duration column
list_values = ["PySpark", '30days']
df2 = df[~df[['Courses', 'Duration']].isin(list_values).any(axis=1)]
print(df2)

# Use numpy with not in filter
list_values = ["PySpark", "Python", 22000]
df2 = df[~np.isin(df["Courses"], list_values)]
print(df2)

Frequently Asked Questions on How to Use NOT IN Filter in Pandas?

How can I filter rows based on a column not being in a specific list in pandas?

To filter rows based on a column not being in a specific list in pandas, you can use the ~ (tilde) operator along with the isin() method.

Can I use NOT IN with multiple conditions?

You can use NOT IN (represented by the ~ operator) with multiple conditions in pandas by combining them with logical operators such as & (AND) and | (OR). For example, the conditions are combined using the & operator, representing the logical AND. The ~ operator then negates the entire condition, resulting in a DataFrame with rows that do not meet the specified criteria.

What if I want to filter based on multiple columns?

If you want to filter rows based on multiple columns using the NOT IN logic (represented by the ~ operator), you can combine multiple conditions with logical operators like & (AND) and | (OR).

How do I negate a single condition without using isin()?

To negate a single condition without using the isin() method, you can directly use the ~ (tilde) operator. For example, the condition df['City'] == 'Chicago' checks whether each value in the ‘City’ column is equal to ‘Chicago’. The ~ operator negates this condition, resulting in a DataFrame that includes only the rows where the ‘City’ is not equal to ‘Chicago’. Adjust the column name and the condition based on your specific use case.

Conclusion

In this article, I have explained how to use NOT IN in pandas DataFrame by using (~) operator along with isin() & any() function to filter the rows. Also, you have learned how to filter with a single column or multiple columns using NOT IN (~) operator in pandas DataFrame with examples.

Happy Learning !!

Related Articles

Reference

Malli

Malli is an experienced technical writer with a passion for translating complex Python concepts into clear, concise, and user-friendly articles. Over the years, he has written hundreds of articles in Pandas, NumPy, Python, and takes pride in ability to bridge the gap between technical experts and end-users.