pandas support operator chaining (df.query(condition).query(condition)
) by calling methods on objects (DataFrame object) sequentially one after another in order to filter rows. It is a programming style programmers prefers to reduce the number of variables and lines. Like any other framework or programming language, pandas supports operator chaining where you can use this to filter rows of DataFrame.
Traditionally operator chaining is used with groupby & aggregate in pandas, In this article, I will explain different ways of using operator chaining in pandas, for example how to filter rows on the output of another filter, using a boolean operator to apply multiple conditions e.t.c.
1. Quick Examples of Operator Chaining to Filter Rows in pandas
If you are in a hurry, below are some quick examples of how to filter rows of pandas DataFrame with operator chaining.
# Below are some quick examples.
df2=df[(df.Fee >= 23000) & (df.Fee <= 24000)]
df2=df[(df.Fee == 23000) & (df.Discount == 1000)]
def mask(df, key, value):
return df[df[key] == value]
pd.DataFrame.mask = mask
df2=df.mask('Fee',23000).mask('Discount',1000)
print(df2)
# Using DataFrame.query() to chain filter rows
df2 = df.query("Fee < 25000").query("Fee > 22000")
# Using query() method to filter rows operator chaining
df2 = df.query("Fee < 25000 and Fee > 22000")
df2=df.loc[lambda x: x.Fee >= 25000].loc[lambda x: x.Discount >= 2000]
df2 = df.loc[lambda x: x['Fee'] == 25000]
df2 = df.pipe(lambda x: x['Fee'] == 25000)
Now, let’s create a DataFrame with a few rows and columns and execute some examples to learn using operator chaining. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
.
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Hadoop","Python","pandas","PySpark","Python","pandas"],
'Fee' :[22000,25000,23000,24000,26000,25000,20000,24000],
'Duration':['30day','50days','55days', '40days','60days','50days','55days','50days'],
'Discount':[1000,2300,1000,1200,2500,2000,2000,1500]
}
df = pd.DataFrame(technologies)
print(df)
Yields below output.
Courses Fee Duration Discount
0 Spark 22000 30day 1000
1 PySpark 25000 50days 2300
2 Hadoop 23000 55days 1000
3 Python 24000 40days 1200
4 pandas 26000 60days 2500
5 PySpark 25000 50days 2000
6 Python 20000 55days 2000
7 pandas 24000 50days 1500
2. Filter Rows by Specific Values using Operator Chaining
# Filter Rows by Specific Values using Operator Chaining
courses=["Spark","Hadoop","Python","pandas"]
df2=df[df.Courses.isin(courses) & df.Duration.str.contains('5')]
print(df2)
Yields below output.
Courses Fee Duration Discount
2 Hadoop 23000 55days 1000
6 Python 20000 55days 2000
7 pandas 24000 50days 1500
Below is another example.
# Another example
df = pd.DataFrame(technologies)
df2=df[(df.Fee >= 23000) & (df.Fee <= 24000)]
print(df2)
Yields below output.
Courses Fee Duration Discount
2 Hadoop 23000 55days 1000
3 Python 24000 40days 1200
7 pandas 24000 50days 1500
3. Chaining DataFrame.query() to Filter Rows in pandas
pandas.DataFrame.query() method is recommended way to filter rows and you can chain these operators to apply multiple conditions. For example df.query(“Fee >= 23000”).query(“Fee <= 24000”) , you can also write the same statement as df.query("Fee >= 23000 and Fee <= 24000")
# Using DataFrame.query() to chain filter rows
df = pd.DataFrame(technologies)
df2 = df.query("Fee >= 23000").query("Fee <= 24000")
print(df2)
# Using query() method to filter rows operator chaining
df = pd.DataFrame(technologies)
df2 = df.query("Fee >= 23000 and Fee <= 24000")
print(df2)
Yields same output as above.
4. Using mask() to Filter Rows by Operator Chaining
Chaining mask() methods to filter rows in pandas DataFrame. The mask method works similar to if-then. For each element in the calling DataFrame, if cond
is False
the element is used; otherwise the corresponding element from the DataFrame other
is used.
# Using mask() to Filter Rows by Operator Chaining
def mask(df, key, value):
return df[df[key] == value]
pd.DataFrame.mask = mask
df = pd.DataFrame(technologies)
df2=df.mask('Fee',23000).mask('Discount',1000)
print(df2)
Yields below output..
Courses Fee Duration Discount
2 Hadoop 23000 55days 1000
5. Using mask() and Lambda Function with Operator Chaining
With mask() method, you can also use lambda function chain operators.
#Using mask and lambda function to filter
df = pd.DataFrame(technologies)
df2 = df.mask(lambda x: x['Courses'] == "pandas")
print(df2)
Yields below output.
Courses Fee Duration Discount
a Spark 22000.0 30day 1000.0
b PySpark 25000.0 50days 2300.0
c Hadoop 23000.0 55days 1000.0
d Python 24000.0 40days 1200.0
e NaN NaN NaN NaN
f PySpark 25000.0 50days 2000.0
g Python 20000.0 55days 2000.0
h NaN NaN NaN NaN
6. Using Operator Chaining with DataFrame.loc[]
You can also use loc[] to filter rows with operator chaining. Below are some examples.
#Chaining loc[] operator to filter rows
df = pd.DataFrame(technologies)
df2 = df.loc[lambda x: x['Fee'] == 25000]
print(df2)
df = pd.DataFrame(technologies)
df2=df.loc[lambda x: x.Fee >= 23000].loc[lambda x: x.Discount <= 24000]
print(df2)
I will leave this example to you to run and explore the output.
7. Using pipe() Method
Below is another way but this returns ‘TRUE/FALSE’ in the result.
# Using pipe() method
df = pd.DataFrame(technologies)
df2 = df.pipe(lambda x: x['Fee'] == 25000)
print(df2)
Conclusion
In this article, you have learned how to use operator chaining on DataFrame to filter rows by using DataFrame.query()
, DataFrame.loc[]
, and DataFrame.mask()
method. Operator chaining is a programmer’s choice to reduce multiple lines and combining multiple statements together.
Happy Learning !!
Related Articles
- Pandas Filter Rows using DataFrame.query() Method
- Upgrade Pandas Version to Latest or Specific Version
- How to Change Position of a Column in Pandas
- Replace NaN with Blank/Empty String on Pandas
- How to Shuffle Pandas DataFrame Rows Examples
- Pandas Filter DataFrame by Multiple Conditions
- Pandas Filter Rows by Conditions
- Pandas Filter DataFrame by Substring criteria
- Pandas Filter Rows with NAN Value from DataFrame Column