You can filter/select rows from Pandas DataFrame using IN
(ISIN) operator like SQL by using pandas.Series.isin(), DataFrame.query() methods. In this article, I will explain how to filter a single column, how to filter multiple columns, how to filter based on conditions, and lambda functions using IN operator with examples in Pandas DataFrame.
Key Points –
- Pandas provides the
isin()
method to filter rows based on whether the values in a column are part of a specified list or array, mimicking the SQLIN
clause. - The
isin()
method can be applied directly to a single column to filter rows where the column’s values match any value in a given list. - You can filter rows based on multiple columns using
isin()
on each column and combining them with logical conditions. - The
query()
method allows you to useIN
-like filtering within string expressions, supporting dynamic and complex conditions. - For filtering rows across multiple columns, you can combine
isin()
with.any(axis=1)
or.all(axis=1)
to check if any or all columns match the condition. - In
query()
, you can reference external Python variables in the query string by prefixing them with the@
symbol, making the query more dynamic and flexible.
Quick Examples of Using IN Like SQL
If you are in a hurry, below are some quick examples of how to use IN
operator in pandas DataFrame.
# Quick examples of using iN like SQL
# Filtering a single column with pandas isin
Courses_to_keep=["Spark","Python"]
df2=df[df.Courses.isin(Courses_to_keep)]
# To return a boolean array
df2=df.Courses.isin(Courses_to_keep)
# Filtering Multiple Columns With Pandas Isin
df2 = df[df[['Courses', 'Fee']].isin(['Spark', 20000]).any(axis=1)]
# Filtering Using Pandas isin Not matching condition
Courses_to_keep=["Spark","Python"]
df2=df[~df.Courses.isin(Courses_to_keep)]
# Filtering Pandas DataFrame by df.query() method
Courses_to_keep=["Spark","Python"]
df2=df.query("Courses in @Courses_to_keep")
# Filtering Pandas DataFrame by query() method perform NOT IN
Courses_to_keep=["Spark","Python"]
df2=df.query("Courses not in @Courses_to_keep")
Now, let’s create a pandas DataFrame from Dict object and execute these examples and validate the result. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
.
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
'Discount':[1000,2300,1200,2000]
}
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)
Yields below output.
# Output:
Courses Fee Duration Discount
r1 Spark 20000 30days 1000
r2 PySpark 25000 40days 2300
r3 Python 22000 35days 1200
r4 pandas 30000 50days 2000
Filter Using isin() similar to IN in SQL
Similar to SQL IN operators, you can filter rows from pandas DataFrame by checking column values in a list. pandas.Series.isin()
function is used to check whether the elements in Series contain specified values. It also returns a boolean Series whether each element in the Series matches an element in the passed sequence of values exactly.
For example, let’s say you wanted to select rows that contain a list (Courses_to_keep
) of values in a column Courses
. To accomplish this, you could write:
# Filtering a single column with pandas Isin.
Courses_to_keep=["Spark","Python"]
df2=df[df.Courses.isin(Courses_to_keep)]
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
r1 Spark 20000 30days 1000
r3 Python 22000 35days 1200
For a better understanding of what’s actually going on here. Let’s run df.series.isin()
which returns a boolean array.
# To return a Boolean array
df2=df.Courses.isin(Courses_to_keep)
print(df2)
# Output:
# r1 True
# r2 False
# r3 True
# r4 False
# Name: Courses, dtype: bool
Filter Row By Multiple Columns With Pandas isin()
Let’s take a look at filter on multiple columns by .isin()
method. It adds a little bit of complexity to the equation.
# Filtering Multiple Columns With Pandas isin().
df2 = df[df[['Courses', 'Fee']].isin(['Spark',30000]).any(axis=1)]
print(df2)
# Output:
# Courses Fee Duration Discount
# r1 Spark 20000 30days 1000
# r4 pandas 30000 50days 2000
Filter Using isin() Not Matching Condition
Similar to .isin()
method to SQL
’s IN
statement, we can use the Pandas unary operator (~
) to perform a NOT IN selection.
# Filtering Using pandas isin not matching condition.
Courses_to_keep=["Spark","Python"]
df2=df[~df.Courses.isin(Courses_to_keep)]
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
r2 PySpark 25000 40days 2300
r4 pandas 30000 50days 2000
Filter Using IN operator and DataFrame.query()
The Pandas DataFrame.query() function is used to query the columns of a DataFrame with a boolean expression. In this method you can use in operator similar to SQL expression.
# Filtering Pandas DataFrame by df.query() method.
Courses_to_keep=["Spark","Python"]
df2=df.query("Courses in @Courses_to_keep")
print(df2)
# Output:
# Courses Fee Duration Discount
# r1 Spark 20000 30days 1000
# r3 Python 22000 35days 1200
NOTE: The query()
method is similar to SQL’s query statement, we can also use the Pandas unary operator (~
) to perform a NOT IN
selection.
# Filtering Pandas DataFrame by query() method perform NOT IN
Courses_to_keep=["Spark","Python"]
df2=df.query("Courses not in @Courses_to_keep")
print(df2)
# Output:
# Courses Fee Duration Discount
# r2 PySpark 25000 40days 2300
# r4 pandas 30000 50days 2000
Filter Pandas DataFrame by Lambda Function
A lambda function is a small function containing a single expression. It can also act as anonymous functions where they don’t require any name.
# Filtering pandas DataFrame by lambda function.
Courses=["Spark","Python"]
criterion=lambda row:row['Courses']not in Courses
df=df[df.apply(criterion,axis=1)]
print(df)
Yields output same as above.
Complete Examples of Filter DataFrame Using IN
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
'Discount':[1000,2300,1200,2000]
}
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)
# Filtering a single column with pandas Isin.
Courses_to_keep=["Spark","Python"]
df2=df[df.Courses.isin(Courses_to_keep)]
# To return a Boolean array.
df2=df.Courses.isin(Courses_to_keep)
print(df2)
# Filtering Multiple Columns With Pandas Isin.
df2 = df[df[['Courses', 'Fee']].isin(['Spark', '20000']).any(axis=1)]
# Filtering Using Pandas isin Not matching condition.
Courses_to_keep=["Spark","Python"]
df2=df[~df.Courses.isin(Courses_to_keep)]
# Filtering Pandas DataFrame by df.query() method.
Courses_to_keep=["Spark","Python"]
df2=df.query("Courses in @Courses_to_keep")
# Filtering Pandas DataFrame by query() method perform NOT IN.
Courses_to_keep=["Spark","Python"]
df2=df.query("Courses not in @Courses_to_keep")
FAQ on Pandas Filter Rows Using IN Like SQL
To filter rows in a Pandas DataFrame based on a single column using IN
logic (like SQL), you can use the .isin()
method.
You can get a Boolean array instead of filtered rows by applying the .isin()
method directly to the column. This array indicates whether each row satisfies the condition.
To filter rows based on multiple columns in Pandas, you can use the .isin()
method in combination with .any(axis=1)
or .all(axis=1)
depending on whether any or all conditions must be satisfied.
To filter rows with a NOT IN condition in Pandas, you can use the ~
operator (bitwise NOT) with the .isin()
method. This negates the condition, returning rows that do not match the values in the list.
Performance depends on the size of the DataFrame and the filtering logic. For simple conditions, .isin()
is usually faster, but query()
provides better readability for complex filters.
Conclusion
In this article, you have learned about pd.Series.isin()
, DataFrame.query()
methods to filter/select rows using IN operator similar to SQL. Also, you have learned how to filter using a single column, how to filter multiple columns, and how to filter based on conditions not being true and lambda functions.
Related Articles
- Pandas filter by column value
- Pandas Series filter() Function
- Pandas Read TSV with Examples
- How to Use NOT IN Filter in Pandas
- Pandas Filter DataFrame Rows on Dates
- Pandas Filter DataFrame by Multiple Conditions
- pandas DataFrame filter() – Usage & Examples
- Pandas Filter DataFrame by Substring criteria
- Pandas Read SQL Query or Table with Examples
- Apply Multiple Filters to Pandas DataFrame or Series
- Pandas Operator Chaining to Filter DataFrame Rows
- Filter Rows with NAN Value from DataFrame Column