Pandas NOT IN (~)
operator filter is used to check whether a particular data is available in the DataFrame or not. Pandas don’t have a NOT IN operator, however, you can perform the NOT IN condition by negating DataFrame.isin() result. In this article, I will explain how to filter with a single column or multiple columns, and NumPy using NOT IN (~) operator in pandas DataFrame with examples.
Key Points –
- The NOT IN filter is typically implemented using the
isin()
method combined with the negation operator (~
). - Apply the NOT IN filter on a single column to exclude rows based on a list of values.
- Use the NOT IN filter across multiple columns by combining conditions with logical operators.
- Utilize boolean indexing to create a mask that filters out unwanted rows from the DataFrame.
- Use NumPy functions like
np.isin()
for more advanced filtering scenarios or when working with complex conditions. - Combine the NOT IN filter with other Pandas operations (like
groupby
,sort_values
, etc.) for more complex data manipulation tasks.
Quick Examples of NOT IN Filter
If you are in a hurry, below are some quick examples of how to use NOT IN (~) operator to filter DataFrame.
# Quick examples of nOt in filter in pandas
# Example 1: Filter single column
# Use NOT IN filter
list_values = ["Hadoop", "Python"]
df2 = df[~df['Courses'].isin(list_values)]
# Example 2: Filter for rows where the Fee name is not in list
list_values = [23000]
df2 = df[~df['Fee'].isin(list_values)]
# Example 3: Filter for rows where the Discount name is not in list
list_values = [1000, 2300]
df2 = df[~df['Discount'].isin(list_values)]
# Example 4: Use not in filter with multiple column
list_values = ["Spark", "Pandas", 1000]
df2 = df[~df[['Courses', 'Discount']].isin(list_values).any(axis=1)]
# Example 5: Filter in Courses and Duration column
list_values = ["PySpark", '30days']
df2 = df[~df[['Courses', 'Duration']].isin(list_values).any(axis=1)]
# Example 6: Use numpy with not in filter
list_values = ["PySpark", "Python", 22000]
df2 = df[~np.isin(df["Courses"], list_values)]
Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are Courses
, Fee
, Discount
, and Duration
.
# Create DataFrame
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark", "PySpark", "Hadoop", "Python", "Pandas"],
'Fee' :[22000, 25000, 23000, 24000, 26000],
'Discount':[1000, 2300, 1000, 1200, 2500],
'Duration':['35days', '35days', '40days', '30days', '25days']
}
df = pd.DataFrame(technologies)
print("Create DataFrame:\n",df)
Yields below output.
Filter Using NOT IN in Pandas
We can use the Pandas unary operator (~)
to perform a NOT IN to filter the DataFrame on a single column. We should use isin()
operator to get the given values in the DataFrame and use the unary operator ~
to negate the result. In the first example from the following, we are selecting the DataFrame, where Courses
not in the list of values.
# Define list of teams we don't want
list_values = ["Hadoop", "Python"]
# Filter single column Use NOT IN filter
df2 = df[~df['Courses'].isin(list_values)]
print(df2)
print("------------------------------------")
# Filter for rows where the Fee name is not in list
list_values = [23000]
df2 = df[~df['Fee'].isin(list_values)]
print(df2)
Yields below output.
# Output:
Courses Fee Discount Duration
0 Spark 22000 1000 35days
1 PySpark 25000 2300 35days
4 Pandas 26000 2500 25days
------------------------------------
Courses Fee Discount Duration
0 Spark 22000 1000 35days
1 PySpark 25000 2300 35days
3 Python 24000 1200 30days
4 Pandas 26000 2500 25days
Use NOT IN Filter with Multiple Columns
We can also use the pandas (~)
operator to perform a NOT IN filter on multiple columns or more than one column by using .isin()
& any()
function. This function will check the value that exists in any given column and the columns are given in [[]]
separated by a comma.
# Consider a list
list_values = ["Spark", "Pandas", 1000]
# Use not in filter with multiple column
df2 = df[~df[['Courses', 'Discount']].isin(list_values).any(axis=1)]
print(df2)
print("------------------------------------")
# Filter in courses and duration column
list_values = ["PySpark", '30days']
df2 = df[~df[['Courses', 'Duration']].isin(list_values).any(axis=1)]
print(df2)
Yields below output.
# Output:
Courses Fee Discount Duration
1 PySpark 25000 2300 35days
3 Python 24000 1200 30days
------------------------------------
Courses Fee Discount Duration
0 Spark 22000 1000 35days
2 Hadoop 23000 1000 40days
4 Pandas 26000 2500 25days
Use NumPy with NOT IN Filter
Use numpy with a NOT IN filter, we can also filter the list of column values based on the specified column. It will return the DataFrame without a specified list of values.
# Use numpy with not in filter
list_values = ["PySpark", "Python"]
df2 = df[~np.isin(df["Courses"], list_values)]
print(df2)
Yields below output.
# Output:
Courses Fee Discount Duration
0 Spark 22000 1000 35days
2 Hadoop 23000 1000 40days
4 Pandas 26000 2500 25days
Complete Example For NOT IN Filter
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark", "PySpark", "Hadoop", "Python", "Pandas"],
'Fee' :[22000, 25000, 23000, 24000, 26000],
'Discount':[1000, 2300, 1000, 1200, 2500],
'Duration':['35days', '35days', '40days', '30days', '25days']
}
df = pd.DataFrame(technologies)
print(df)
# Filter single column Use NOT IN filter
list_values = ["Hadoop", "Python"]
df2 = df[~df['Courses'].isin(list_values)]
print(df2)
# Filter for rows where the Fee name is not in list
list_values = [23000]
df2 = df[~df['Fee'].isin(list_values)]
print(df2)
# Filter for rows where the Discount name is not in list
list_values = [1000, 2300]
df2 = df[~df['Discount'].isin(list_values)]
print(df2)
# Use not in filter with multiple column
list_values = ["Spark", "Pandas", 1000]
df2 = df[~df[['Courses', 'Discount']].isin(list_values).any(axis=1)]
print(df2)
# Filter in Courses and Duration column
list_values = ["PySpark", '30days']
df2 = df[~df[['Courses', 'Duration']].isin(list_values).any(axis=1)]
print(df2)
# Use numpy with not in filter
list_values = ["PySpark", "Python", 22000]
df2 = df[~np.isin(df["Courses"], list_values)]
print(df2)
Frequently Asked Questions on How to Use NOT IN Filter in Pandas?
To filter rows based on a column not being in a specific list in pandas, you can use the ~
(tilde) operator along with the isin()
method.
You can use NOT IN
(represented by the ~
operator) with multiple conditions in pandas by combining them with logical operators such as &
(AND) and |
(OR). For example, the conditions are combined using the &
operator, representing the logical AND. The ~
operator then negates the entire condition, resulting in a DataFrame with rows that do not meet the specified criteria.
If you want to filter rows based on multiple columns using the NOT IN
logic (represented by the ~
operator), you can combine multiple conditions with logical operators like &
(AND) and |
(OR).
To negate a single condition without using the isin()
method, you can directly use the ~
(tilde) operator. For example, the condition df['City'] == 'Chicago'
checks whether each value in the ‘City’ column is equal to ‘Chicago’. The ~
operator negates this condition, resulting in a DataFrame that includes only the rows where the ‘City’ is not equal to ‘Chicago’. Adjust the column name and the condition based on your specific use case.
Conclusion
In this article, I have explained how to use NOT IN in pandas DataFrame by using (~)
operator along with isin()
& any()
function to filter the rows. Also, you have learned how to filter with a single column or multiple columns using NOT IN (~)
operator in pandas DataFrame with examples.
Happy Learning !!
Related Articles
- Pandas Series filter() Function
- Pandas filter by column value
- Filter DataFrame Rows on Dates
- Pandas Filter Rows Using IN Like SQL
- Filter DataFrame by Substring criteria
- Pandas Drop Last Column From DataFrame
- Pandas – Get Column Index For Column Name
- Pandas Filter DataFrame by Multiple Conditions
- pandas DataFrame filter() – Usage & Examples
- Operator Chaining to Filter DataFrame Rows
- Apply Multiple Filters to Pandas DataFrame or Series
- Filter Rows with NAN Value from DataFrame Column