How to Use NOT IN Filter in Pandas

Pandas NOT IN (~) operator filter is used to check whether a particular data is available in the DataFrame or not. Pandas don’t have NOTIN operator, however, you can perform the NOT IN condition by negating DataFrame.isin() result. In this article, I will explain how to filter with a single column or multiple columns, and NumPy using NOT IN (~) operator in pandas DataFrame with examples.

1. Quick Examples of NOT IN Filter in Pandas

If you are in a hurry, below are some quick examples of how to use NOT IN (~) operator to filter DataFrame.


# Below are quick example

# Example 1: Filter single column Use NOT IN filter
list_values = ["Hadoop", "Python"]
df2 = df[~df['Courses'].isin(list_values)]

# Example 2: Filter for rows where the Fee name is not in list
list_values = [23000]
df2 = df[~df['Fee'].isin(list_values)]
  
# Example 3: Filter for rows where the Discount name is not in list
list_values = [1000, 2300]
df2 = df[~df['Discount'].isin(list_values)]

# Example 4: Use not in filter with multiple column
list_values = ["Spark", "Pandas", 1000]
df2 = df[~df[['Courses', 'Discount']].isin(list_values).any(axis=1)]
  
# Example 5: Filter in Courses and Duration column
list_values = ["PySpark", '30days']
df2 = df[~df[['Courses', 'Duration']].isin(list_values).any(axis=1)]

# Example 6: Use numpy with not in filter
list_values = ["PySpark", "Python", 22000]
df2 = df[~np.isin(df["Courses"], list_values)]

Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are Courses, Fee, Discount, and Duration.


import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark", "PySpark", "Hadoop", "Python", "Pandas"],
    'Fee' :[22000, 25000, 23000, 24000, 26000],
    'Discount':[1000, 2300, 1000, 1200, 2500],
    'Duration':['35days', '35days', '40days', '30days', '25days']
          }

df = pd.DataFrame(technologies)
print(df)

Yields below output.


   Courses    Fee  Discount Duration
0    Spark  22000      1000   35days
1  PySpark  25000      2300   35days
2   Hadoop  23000      1000   40days
3   Python  24000      1200   30days
4   Pandas  26000      2500   25days

2. Filter Using NOT IN in Pandas

We can use the Pandas unary operator (~) to perform a NOT IN to filter the DataFrame on a single column. We should use isin() operator to get the given values in the DataFrame and use the unary operator ~ to negate the result. In the first example from the following, we are selecting the DataFrame, where Courses not in the list of values.


# define list of teams we don't want
list_values = ["Hadoop", "Python"]
  
# Filter single column Use NOT IN filter
df2 = df[~df['Courses'].isin(list_values)]
print(df2)
print("------------------------------------")
  
# Filter for rows where the Fee name is not in list
list_values = [23000]
df2 = df[~df['Fee'].isin(list_values)]
print(df2)

Yields below output.


# Output
   Courses    Fee  Discount Duration
0    Spark  22000      1000   35days
1  PySpark  25000      2300   35days
4   Pandas  26000      2500   25days
------------------------------------
   Courses    Fee  Discount Duration
0    Spark  22000      1000   35days
1  PySpark  25000      2300   35days
3   Python  24000      1200   30days
4   Pandas  26000      2500   25days

3. Use NOT IN Filter with Multiple Columns

We can also use the Pandas (~) operator to perform a NOT IN filter on multiple columns or more than one column by using .isin() & any() function. This function will check the value that exists in any given column and the columns are given in [[]] separated by a comma.


# consider a list
list_values = ["Spark", "Pandas", 1000]
   
# Use not in filter with multiple column
df2 = df[~df[['Courses', 'Discount']].isin(list_values).any(axis=1)]
print(df2)
print("------------------------------------")
  
# filter in Courses and Duration column
list_values = ["PySpark", '30days']
df2 = df[~df[['Courses', 'Duration']].isin(list_values).any(axis=1)]
print(df2)

Yields below output.


   Courses    Fee  Discount Duration
1  PySpark  25000      2300   35days
3   Python  24000      1200   30days
------------------------------------
  Courses    Fee  Discount Duration
0   Spark  22000      1000   35days
2  Hadoop  23000      1000   40days
4  Pandas  26000      2500   25days

4. Use NumPy with NOT IN Filter

Use numpy with a NOT IN filter, we can also filter the list of column values based on the specified column. It will return the DataFrame without a specified list of values.


# Use numpy with not in filter
list_values = ["PySpark", "Python"]
df2 = df[~np.isin(df["Courses"], list_values)]
print(df2)

Yields below output.


  Courses    Fee  Discount Duration
0   Spark  22000      1000   35days
2  Hadoop  23000      1000   40days
4  Pandas  26000      2500   25days

5. Complete Example For NOT IN Filter in Pandas


import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark", "PySpark", "Hadoop", "Python", "Pandas"],
    'Fee' :[22000, 25000, 23000, 24000, 26000],
    'Discount':[1000, 2300, 1000, 1200, 2500],
    'Duration':['35days', '35days', '40days', '30days', '25days']
              }
df = pd.DataFrame(technologies)
print(df)

# Filter single column Use NOT IN filter
list_values = ["Hadoop", "Python"]
df2 = df[~df['Courses'].isin(list_values)]
print(df2)

# Filter for rows where the Fee name is not in list
list_values = [23000]
df2 = df[~df['Fee'].isin(list_values)]
print(df2)
  
# Filter for rows where the Discount name is not in list
list_values = [1000, 2300]
df2 = df[~df['Discount'].isin(list_values)]
print(df2)

# Use not in filter with multiple column
list_values = ["Spark", "Pandas", 1000]
df2 = df[~df[['Courses', 'Discount']].isin(list_values).any(axis=1)]
print(df2)
  
# filter in Courses and Duration column
list_values = ["PySpark", '30days']
df2 = df[~df[['Courses', 'Duration']].isin(list_values).any(axis=1)]
print(df2)

# Use numpy with not in filter
list_values = ["PySpark", "Python", 22000]
df2 = df[~np.isin(df["Courses"], list_values)]
print(df2)

6. Conclusion

In this article, I have explained how to use NOT IN in pandas DataFrame by using (~) operator along with isin() & any() function to filter the rows. Also, you have learned how to filter with a single column or multiple columns using NOT IN (~) operator in pandas DataFrame with examples.

Happy Learning !!

Related Articles

Reference

Leave a Reply