You can filter/select rows from Pandas DataFrame using IN
(ISIN) operator like SQL by using pandas.Series.isin(), DataFrame.query() methods. In this article, I will explain how to filter a single column, how to filter multiple columns, how to filter based on conditions, and lambda functions using IN operator with examples in Pandas DataFrame.
1. Quick Examples of Using IN Like SQL
If you are in a hurry, below are some quick examples of how to use IN
operator in pandas DataFrame.
# Below are some Quick examples.
# Filtering a single column with pandas isin.
Courses_to_keep=["Spark","Python"]
df2=df[df.Courses.isin(Courses_to_keep)]
# To return a Boolean array.
df2=df.Courses.isin(Courses_to_keep)
print(df2)
# Filtering Multiple Columns With Pandas Isin.
df2 = df[df[['Courses', 'Fee']].isin(['Spark', 20000]).any(axis=1)]
# Filtering Using Pandas isin Not matching condition.
Courses_to_keep=["Spark","Python"]
df2=df[~df.Courses.isin(Courses_to_keep)]
# Filtering Pandas DataFrame by df.query() method.
Courses_to_keep=["Spark","Python"]
df2=df.query("Courses in @Courses_to_keep")
# Filtering Pandas DataFrame by query() method perform NOT IN.
Courses_to_keep=["Spark","Python"]
df2=df.query("Courses not in @Courses_to_keep")
Now, let’s create a pandas DataFrame from Dict object and execute these examples and validate the result. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
.
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
'Discount':[1000,2300,1200,2000]
}
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)
Yields below output.
# Output:
Courses Fee Duration Discount
r1 Spark 20000 30days 1000
r2 PySpark 25000 40days 2300
r3 Python 22000 35days 1200
r4 pandas 30000 50days 2000
2. Filter Using isin() similar to IN in SQL
Similar to SQL IN operators, you can filter rows from pandas DataFrame by checking column values in a list. pandas.Series.isin()
function is used to check whether the elements in Series contain specified values. It also returns a boolean Series whether each element in the Series matches an element in the passed sequence of values exactly.
For example, let’s say you wanted to select rows that contain a list (Courses_to_keep
) of values in a column Courses
. To accomplish this, you could write:
# Filtering a single column with pandas Isin.
Courses_to_keep=["Spark","Python"]
df2=df[df.Courses.isin(Courses_to_keep)]
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
r1 Spark 20000 30days 1000
r3 Python 22000 35days 1200
For a better understanding of what’s actually going on here. Let’s run df.series.isin()
which returns a boolean array.
# To return a Boolean array.
df2=df.Courses.isin(Courses_to_keep)
print(df2)
# Output:
# r1 True
# r2 False
# r3 True
# r4 False
# Name: Courses, dtype: bool
3. Filter Row By Multiple Columns With Pandas isin()
Let’s take a look at filter on multiple columns by .isin()
method. It adds a little bit of complexity to the equation.
# Filtering Multiple Columns With Pandas isin().
df2 = df[df[['Courses', 'Fee']].isin(['Spark',30000]).any(axis=1)]
print(df2)
# Output:
# Courses Fee Duration Discount
# r1 Spark 20000 30days 1000
# r4 pandas 30000 50days 2000
4. Filter Using isin() Not Matching Condition
Similar to .isin()
method to SQL
’s IN
statement, we can use the Pandas unary operator (~
) to perform a NOT IN selection.
# Filtering Using pandas isin not matching condition.
Courses_to_keep=["Spark","Python"]
df2=df[~df.Courses.isin(Courses_to_keep)]
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
r2 PySpark 25000 40days 2300
r4 pandas 30000 50days 2000
5. Filter Using IN operator and DataFrame.query()
The Pandas DataFrame.query() function is used to query the columns of a DataFrame with a boolean expression. In this method you can use in operator similar to SQL expression.
# Filtering Pandas DataFrame by df.query() method.
Courses_to_keep=["Spark","Python"]
df2=df.query("Courses in @Courses_to_keep")
print(df2)
# Output:
# Courses Fee Duration Discount
# r1 Spark 20000 30days 1000
# r3 Python 22000 35days 1200
NOTE: The query()
method is similar to SQL’s query statement, we can also use the Pandas unary operator (~
) to perform a NOT IN
selection.
# Filtering Pandas DataFrame by query() method perform NOT IN.
Courses_to_keep=["Spark","Python"]
df2=df.query("Courses not in @Courses_to_keep")
print(df2)
# Output:
# Courses Fee Duration Discount
# r2 PySpark 25000 40days 2300
# r4 pandas 30000 50days 2000
6. Filter Pandas DataFrame by Lambda Function
A lambda function is a small function containing a single expression. It can also act as anonymous functions where they don’t require any name.
# Filtering pandas DataFrame by lambda function.
Courses=["Spark","Python"]
criterion=lambda row:row['Courses']not in Courses
df=df[df.apply(criterion,axis=1)]
print(df)
Yields output same as above.
7. Complete Examples of Filter DataFrame Using IN
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
'Discount':[1000,2300,1200,2000]
}
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)
# Filtering a single column with pandas Isin.
Courses_to_keep=["Spark","Python"]
df2=df[df.Courses.isin(Courses_to_keep)]
# To return a Boolean array.
df2=df.Courses.isin(Courses_to_keep)
print(df2)
# Filtering Multiple Columns With Pandas Isin.
df2 = df[df[['Courses', 'Fee']].isin(['Spark', '20000']).any(axis=1)]
# Filtering Using Pandas isin Not matching condition.
Courses_to_keep=["Spark","Python"]
df2=df[~df.Courses.isin(Courses_to_keep)]
# Filtering Pandas DataFrame by df.query() method.
Courses_to_keep=["Spark","Python"]
df2=df.query("Courses in @Courses_to_keep")
# Filtering Pandas DataFrame by query() method perform NOT IN.
Courses_to_keep=["Spark","Python"]
df2=df.query("Courses not in @Courses_to_keep")
Conclusion
In this article, you have learned about pd.Series.isin()
, DataFrame.query()
methods to filter/select rows using IN operator similar to SQL. Also, you have learned how to filter using a single column, how to filter multiple columns, and how to filter based on conditions not being true and lambda functions.
Related Articles
- Pandas Series filter() Function
- How to Use NOT IN Filter in Pandas
- Pandas Filter DataFrame Rows on Dates
- Pandas Filter DataFrame by Multiple Conditions
- pandas DataFrame filter() – Usage & Examples
- Pandas Filter DataFrame by Substring criteria
- Apply Multiple Filters to Pandas DataFrame or Series
- Pandas Operator Chaining to Filter DataFrame Rows
- Pandas Filter Rows with NAN Value from DataFrame Column
- Pandas filter by column value