• Post author:
  • Post category:Pandas
  • Post last modified:November 21, 2024
  • Reading time:15 mins read
You are currently viewing Pandas Operator Chaining to Filter DataFrame Rows

Pandas support operator chaining (df.query(condition).query(condition)) by calling methods on objects (DataFrame object) sequentially one after another in order to filter rows. It is a programming style programmers prefers to reduce the number of variables and lines. Like any other framework or programming language, pandas supports operator chaining where you can use this to filter rows of DataFrame.

Advertisements

Traditionally operator chaining is used with groupby & aggregate in pandas, In this article, I will explain different ways of using operator chaining in pandas, for example how to filter rows on the output of another filter, using a boolean operator to apply multiple conditions e.t.c.

Key Points –

  • Method chaining allows for clean, readable, and concise code when performing multiple operations on a DataFrame.
  • Operator chaining can include filters, transformations, and other DataFrame methods like query(), filter(), and assign().
  • Each method in the chain should return a DataFrame or Series to continue the chain.
  • Filtering rows using boolean conditions can be done with methods like loc[], iloc[], or conditional indexing.
  • Logical operators (&, |, ~) are used for combining multiple conditions in the filtering process.

1. Quick Examples of Operator Chaining to Filter Rows in Pandas

If you are in a hurry, below are some quick examples of how to filter rows of pandas DataFrame with operator chaining.


# Quick examples of operator chaining to filter rows

# Operator Chaining to Filter Rows in pandas
df2=df[(df.Fee >= 23000) & (df.Fee <= 24000)]

df2=df[(df.Fee == 23000) & (df.Discount == 1000)]

def mask(df, key, value):
   return df[df[key] == value]

pd.DataFrame.mask = mask
df2=df.mask('Fee',23000).mask('Discount',1000)   
print(df2)

# Using DataFrame.query() to chain filter rows
df2 = df.query("Fee < 25000").query("Fee > 22000")

# Using query() method to filter rows operator chaining
df2 = df.query("Fee < 25000 and Fee > 22000")

df2=df.loc[lambda x: x.Fee >= 25000].loc[lambda x: x.Discount >= 2000]

df2 = df.loc[lambda x: x['Fee'] == 25000]

df2 = df.pipe(lambda x: x['Fee'] == 25000)

Now, let’s create a DataFrame with a few rows and columns and execute some examples to learn using operator chaining. Our DataFrame contains column names Courses, Fee, Duration, and Discount.


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas","PySpark","Python","pandas"],
    'Fee' :[22000,25000,23000,24000,26000,25000,20000,24000],
    'Duration':['30day','50days','55days', '40days','60days','50days','55days','50days'],
    'Discount':[1000,2300,1000,1200,2500,2000,2000,1500]
          }
df = pd.DataFrame(technologies)
print(df)

Yields below output.


# Output:
   Courses    Fee Duration  Discount
0    Spark  22000    30day      1000
1  PySpark  25000   50days      2300
2   Hadoop  23000   55days      1000
3   Python  24000   40days      1200
4   pandas  26000   60days      2500
5  PySpark  25000   50days      2000
6   Python  20000   55days      2000
7   pandas  24000   50days      1500

2. Filter Rows by Specific Values using Operator Chaining

To filter rows by specific values using operator chaining in Pandas.


# Filter Rows by Specific Values using Operator Chaining
courses=["Spark","Hadoop","Python","pandas"]
df2=df[df.Courses.isin(courses) & df.Duration.str.contains('5')]
print(df2)

Yields below output.


# Output:
  Courses    Fee Duration  Discount
2  Hadoop  23000   55days      1000
6  Python  20000   55days      2000
7  pandas  24000   50days      1500

Below is another example.


# Another example
df = pd.DataFrame(technologies)
df2=df[(df.Fee >= 23000) & (df.Fee <= 24000)]
print(df2)

Yields below output.


# Output:
  Courses    Fee Duration  Discount
2  Hadoop  23000   55days      1000
3  Python  24000   40days      1200
7  pandas  24000   50days      1500

3. Chaining DataFrame.query() to Filter Rows in pandas

pandas.DataFrame.query() method is recommended way to filter rows and you can chain these operators to apply multiple conditions. For example df.query(“Fee >= 23000”).query(“Fee <= 24000”) , you can also write the same statement as df.query("Fee >= 23000 and Fee <= 24000")


# Using DataFrame.query() to chain filter rows
df = pd.DataFrame(technologies)
df2 = df.query("Fee >= 23000").query("Fee <= 24000")
print(df2)

# Using query() method to filter rows operator chaining
df = pd.DataFrame(technologies)
df2 = df.query("Fee >= 23000 and Fee <= 24000")
print(df2)

Yields same output as above.

4. Using mask() to Filter Rows by Operator Chaining

Chaining mask() methods to filter rows in pandas DataFrame. The mask method works similar to if-then. For each element in the calling DataFrame, if cond is False the element is used; otherwise the corresponding element from the DataFrame other is used.


# Using mask() to Filter Rows by Operator Chaining
def mask(df, key, value):
   return df[df[key] == value]

pd.DataFrame.mask = mask
df = pd.DataFrame(technologies)
df2=df.mask('Fee',23000).mask('Discount',1000)   
print(df2)

Yields below output..


# Output:
  Courses    Fee Duration  Discount
2  Hadoop  23000   55days      1000

5. Using mask() and Lambda Function with Operator Chaining

With mask() method, you can also use lambda function chain operators.


# Using mask and lambda function to filter
df = pd.DataFrame(technologies)
df2 = df.mask(lambda x: x['Courses'] == "pandas")
print(df2)

Yields below output.


# Output:
   Courses      Fee Duration  Discount
a    Spark  22000.0    30day    1000.0
b  PySpark  25000.0   50days    2300.0
c   Hadoop  23000.0   55days    1000.0
d   Python  24000.0   40days    1200.0
e      NaN      NaN      NaN       NaN
f  PySpark  25000.0   50days    2000.0
g   Python  20000.0   55days    2000.0
h      NaN      NaN      NaN       NaN

6. Using Operator Chaining with DataFrame.loc[]

You can also use loc[] to filter rows with operator chaining. Below are some examples.


# Chaining loc[] operator to filter rows
df = pd.DataFrame(technologies)
df2 = df.loc[lambda x: x['Fee'] == 25000]
print(df2)

df = pd.DataFrame(technologies)
df2=df.loc[lambda x: x.Fee >= 23000].loc[lambda x: x.Discount <= 24000]
print(df2)

I will leave this example to you to run and explore the output.

7. Using pipe() Method

Below is another way but this returns ‘TRUE/FALSE’ in the result.


# Using pipe() method
df = pd.DataFrame(technologies)
df2 = df.pipe(lambda x: x['Fee'] == 25000)
print(df2)

FAQ on Pandas Operator Chaining to Filter DataFrame Rows

What is operator chaining in Pandas?

Operator chaining in Pandas refers to linking multiple operations together to perform data manipulation in a concise and readable manner. It allows you to apply multiple filters or transformations to a DataFrame without the need to store intermediate results in separate variables. This often involves using methods like .query(), .loc[], .isin(), .str.contains(), and more in sequence.

How do I filter a DataFrame using multiple conditions with operator chaining?

You can filter rows using multiple conditions by combining methods with the & (AND), | (OR), and ~ (NOT) operators.

How can I chain multiple filter conditions without parentheses?

You can use .query() to avoid excessive parentheses when chaining multiple filters. For instance,

Can I filter rows based on string conditions using operator chaining?

You can filter rows using string conditions. For example, if you want to filter rows where a column contains a specific substring, you can use .str.contains().

What if I want to use multiple chained operations on the filtered data?

You can chain multiple operations by using a backslash (\) for line continuation or by simply linking them together. Here’s an example using multiple chained operations.

Conclusion

In this article, you have learned how to use operator chaining on DataFrame to filter rows by using DataFrame.query(), DataFrame.loc[], and DataFrame.mask() method. Operator chaining is a programmer’s choice to reduce multiple lines and combining multiple statements together.

Happy Learning !!

References

Leave a Reply