Pandas NOT IN (~)
operator filter is used to check whether a particular data is available in the DataFrame or not. Pandas don’t have NOTIN operator, however, you can perform the NOT IN condition by negating DataFrame.isin() result. In this article, I will explain how to filter with a single column or multiple columns, and NumPy using NOT IN (~) operator in pandas DataFrame with examples.
1. Quick Examples of NOT IN Filter in Pandas
If you are in a hurry, below are some quick examples of how to use NOT IN (~) operator to filter DataFrame.
# Below are quick example
# Example 1: Filter single column Use NOT IN filter
list_values = ["Hadoop", "Python"]
df2 = df[~df['Courses'].isin(list_values)]
# Example 2: Filter for rows where the Fee name is not in list
list_values = [23000]
df2 = df[~df['Fee'].isin(list_values)]
# Example 3: Filter for rows where the Discount name is not in list
list_values = [1000, 2300]
df2 = df[~df['Discount'].isin(list_values)]
# Example 4: Use not in filter with multiple column
list_values = ["Spark", "Pandas", 1000]
df2 = df[~df[['Courses', 'Discount']].isin(list_values).any(axis=1)]
# Example 5: Filter in Courses and Duration column
list_values = ["PySpark", '30days']
df2 = df[~df[['Courses', 'Duration']].isin(list_values).any(axis=1)]
# Example 6: Use numpy with not in filter
list_values = ["PySpark", "Python", 22000]
df2 = df[~np.isin(df["Courses"], list_values)]
Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are Courses
, Fee
, Discount
, and Duration
.
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark", "PySpark", "Hadoop", "Python", "Pandas"],
'Fee' :[22000, 25000, 23000, 24000, 26000],
'Discount':[1000, 2300, 1000, 1200, 2500],
'Duration':['35days', '35days', '40days', '30days', '25days']
}
df = pd.DataFrame(technologies)
print(df)
Yields below output.
# Output:
Courses Fee Discount Duration
0 Spark 22000 1000 35days
1 PySpark 25000 2300 35days
2 Hadoop 23000 1000 40days
3 Python 24000 1200 30days
4 Pandas 26000 2500 25days
2. Filter Using NOT IN in Pandas
We can use the Pandas unary operator (~)
to perform a NOT IN to filter the DataFrame on a single column. We should use isin()
operator to get the given values in the DataFrame and use the unary operator ~ to negate the result. In the first example from the following, we are selecting the DataFrame, where Courses
not in the list of values.
# Define list of teams we don't want
list_values = ["Hadoop", "Python"]
# Filter single column Use NOT IN filter
df2 = df[~df['Courses'].isin(list_values)]
print(df2)
print("------------------------------------")
# Filter for rows where the Fee name is not in list
list_values = [23000]
df2 = df[~df['Fee'].isin(list_values)]
print(df2)
Yields below output.
# Output:
Courses Fee Discount Duration
0 Spark 22000 1000 35days
1 PySpark 25000 2300 35days
4 Pandas 26000 2500 25days
------------------------------------
Courses Fee Discount Duration
0 Spark 22000 1000 35days
1 PySpark 25000 2300 35days
3 Python 24000 1200 30days
4 Pandas 26000 2500 25days
3. Use NOT IN Filter with Multiple Columns
We can also use the Pandas (~)
operator to perform a NOT IN filter on multiple columns or more than one column by using .isin()
& any()
function. This function will check the value that exists in any given column and the columns are given in [[]]
separated by a comma.
# Consider a list
list_values = ["Spark", "Pandas", 1000]
# Use not in filter with multiple column
df2 = df[~df[['Courses', 'Discount']].isin(list_values).any(axis=1)]
print(df2)
print("------------------------------------")
# Filter in Courses and Duration column
list_values = ["PySpark", '30days']
df2 = df[~df[['Courses', 'Duration']].isin(list_values).any(axis=1)]
print(df2)
Yields below output.
# Output:
Courses Fee Discount Duration
1 PySpark 25000 2300 35days
3 Python 24000 1200 30days
------------------------------------
Courses Fee Discount Duration
0 Spark 22000 1000 35days
2 Hadoop 23000 1000 40days
4 Pandas 26000 2500 25days
4. Use NumPy with NOT IN Filter
Use numpy with a NOT IN filter, we can also filter the list of column values based on the specified column. It will return the DataFrame without a specified list of values.
# Use numpy with not in filter
list_values = ["PySpark", "Python"]
df2 = df[~np.isin(df["Courses"], list_values)]
print(df2)
Yields below output.
# Output:
Courses Fee Discount Duration
0 Spark 22000 1000 35days
2 Hadoop 23000 1000 40days
4 Pandas 26000 2500 25days
5. Complete Example For NOT IN Filter in Pandas
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark", "PySpark", "Hadoop", "Python", "Pandas"],
'Fee' :[22000, 25000, 23000, 24000, 26000],
'Discount':[1000, 2300, 1000, 1200, 2500],
'Duration':['35days', '35days', '40days', '30days', '25days']
}
df = pd.DataFrame(technologies)
print(df)
# Filter single column Use NOT IN filter
list_values = ["Hadoop", "Python"]
df2 = df[~df['Courses'].isin(list_values)]
print(df2)
# Filter for rows where the Fee name is not in list
list_values = [23000]
df2 = df[~df['Fee'].isin(list_values)]
print(df2)
# Filter for rows where the Discount name is not in list
list_values = [1000, 2300]
df2 = df[~df['Discount'].isin(list_values)]
print(df2)
# Use not in filter with multiple column
list_values = ["Spark", "Pandas", 1000]
df2 = df[~df[['Courses', 'Discount']].isin(list_values).any(axis=1)]
print(df2)
# Filter in Courses and Duration column
list_values = ["PySpark", '30days']
df2 = df[~df[['Courses', 'Duration']].isin(list_values).any(axis=1)]
print(df2)
# Use numpy with not in filter
list_values = ["PySpark", "Python", 22000]
df2 = df[~np.isin(df["Courses"], list_values)]
print(df2)
6. Conclusion
In this article, I have explained how to use NOT IN in pandas DataFrame by using (~)
operator along with isin()
& any()
function to filter the rows. Also, you have learned how to filter with a single column or multiple columns using NOT IN (~)
operator in pandas DataFrame with examples.
Happy Learning !!
Related Articles
- Get First Row of Pandas DataFrame
- Pandas Handle Missing Data in Dataframe
- How to Plot Columns of Pandas DataFrame
- Convert GroupBy output from Series to DataFrame
- Find Intersection Between Two Series in Pandas
- Change the Order of Pandas DataFrame Columns
- How to Get an Index from Pandas DataFrame
- Pandas Get First Column of DataFrame as Series
- Apply Multiple Filters to Pandas DataFrame or Series
- How to Combine Two Series into pandas DataFrame