• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:16 mins read
You are currently viewing Pandas.DataFrame.query() by Examples

Pandas DataFrame.query() method is used to query the rows based on the provided expression (single or multiple column conditions) and returns a new DataFrame. In case you want to update the existing referring DataFrame use inplace=True argument.

In this article, I will explain the syntax of the Pandas DataFrame query() method and several working examples like a query with multiple conditions and a query with a string containing to new few.

Related:

1. Quick Examples of pandas query()

If you are in a hurry, below are quick examples of how to use pandas.DataFrame.query() method.


# Below are the quick examples.

# Query Rows using DataFrame.query()
df2=df.query("Courses == 'Spark'")

# Using variable
value='Spark'
df2=df.query("Courses == @value")

# Inpace
df.query("Courses == 'Spark'",inplace=True)

# Not equals, in & multiple conditions
df.query("Courses != 'Spark'")
df.query("Courses in ('Spark','PySpark')")
df.query("`Courses Fee` >= 23000")
df.query("`Courses Fee` >= 23000 and `Courses Fee` <= 24000")

If you are a learner, Let’s see with sample data then run through these examples, and explore the output to understand better. First, let’s create a pandas DataFrame from Dict.


# Create DataFrame
import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Duration':['30days','50days','30days', None,np.nan],
    'Discount':[1000,2300,1000,1200,2500]
          }
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)

Yields below output.

pandas dataframe query

Note that the above DataFrame also contains None and NaN values on the Duration column that I would be using in my examples below to select rows that have None & NaN values or select ignoring these values.

Related: You can drop the Pandas rows with NaN values.

3. Using DataFrame.query()

Following is the syntax of the DataFrame.query() method.


# Query() method syntax
DataFrame.query(expr, inplace=False, **kwargs)
  • expr – expression takes conditions to query rows
  • inplace – Defaults to False. When it is set to True, it updates the existing DataFrame, and query() method returns None.
  • **kwargs –  Keyword arguments that work with eval()

DataFrame.query() is used to filter rows based on one or multiple conditions in an expression.


# Query all rows with Courses equals 'Spark'
df2 = df.query("Courses == 'Spark'")
print("After filtering the rows based on condition:\n", df2)

Yields below output.

pandas dataframe query

In case you want to use a variable in the expression, use @ character.


# Query Rows by using Python variable
value='Spark'
df2 = df.query("Courses == @value")
print("After filtering the rows based on condition:\n", df2)

Yields the same output as above.

If you notice the above examples return a new DataFrame after filtering the rows. if you want to update the existing DataFrame use inplace=True


# Replace current esisting DataFrame
df.query("Courses == 'Spark'",inplace=True)
print("After filtering the rows based on condition:\n", df)

Yields the same output as above.

If you want to select based on column value not equals then use != operator.


# Not equals condition
df2 = df.query("Courses != 'Spark'")
print("After filtering the rows based on condition:\n", df2)

Yields below output.


# Output:
   Courses  Courses Fee Duration  Discount
1  PySpark        25000   50days      2300
2   Hadoop        23000   30days      1000
3   Python        24000     None      1200
4   Pandas        26000      NaN      2500

4. Select Rows Based on List of Column Values

If you have values in a Python list and want to select the rows based on the list of values, use in operator, it’s like checking a value contained in a list of string values.


# Query Rows by list of values
df2 = df.query("Courses in ('Spark','PySpark')")
print("After filtering the rows based on condition:\n", df2)

Yields below output.


# Output:
# After filtering the rows based on condition:
   Courses    Fee Duration  Discount
0    Spark  22000   30days      1000
1  PySpark  25000   50days      2300

You can also write with a list of values in a Python variable.


# Query Rows by list of values
values=['Spark','PySpark']
df2 = df.query("Courses in @values")
print("After filtering the rows based on condition:\n", df2)

Yields the same output as above.

Selecting rows that are not in a list of column values can be done using not-in operator.


# Query Rows not in list of values
values=['Spark','PySpark']
df2 = df.query("Courses not in @values")
print("After filtering the rows based on condition:\n", df)

Yields below output.


# Output:
# After filtering the rows based on condition:
   Courses    Fee Duration  Discount
2  Hadoop  23000   30days      1000
3  Python  24000     None      1200
4  Pandas  26000      NaN      2500

If you have column names with special characters you can use column names surrounded by a tick ` character.


# Create DataFrame
technologies= {
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
    'Courses Fee' :[22000,25000,23000,24000,26000],
    'Duration':['30days','50days','30days', None,np.nan],
    'Discount':[1000,2300,1000,1200,2500]
          }
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)

# Using columns with special characters
df2 = df.query("`Courses Fee` >= 23000")
print("After filtering the rows based on condition:\n", df2)

Yields below output.


# Output:
# Create DataFrame:
    Courses  Courses Fee Duration  Discount
0    Spark        22000   30days      1000
1  PySpark        25000   50days      2300
2   Hadoop        23000   30days      1000
3   Python        24000     None      1200
4   Pandas        26000      NaN      2500

# After filtering the rows based on condition:
    Courses  Courses Fee Duration  Discount
1  PySpark        25000   50days      2300
2   Hadoop        23000   30days      1000
3   Python        24000     None      1200
4   Pandas        26000      NaN      2500

5. Query with Multiple Conditions

In Pandas or any table-like structures, most of the time we would need to select the rows based on multiple conditions by using multiple columns, you can do that in Pandas DataFrame as below.


# Query by multiple conditions
df2 = df.query("`Courses Fee` >= 23000 and `Courses Fee` <= 24000")
print("After filtering the rows based on multiple conditions:\n", df2)

Yields below output. Alternatively, you can also use pandas loc with multiple conditions.


# Output:
# After filtering the rows based on multiple conditions:
  Courses  Courses Fee Duration  Discount
2  Hadoop        23000   30days      1000
3  Python        24000     None      1200

6. Query Rows using apply()

pandas.DataFrame.apply() method is used to apply the expression row-by-row and return the rows that match the values. The below example returns every match when Courses contains a list of specified string values.


# By using lambda function
df2 = df.apply(lambda row: row[df['Courses'].isin(['Spark','PySpark'])])
print("After filtering the rows based on condition:\n", df2)

Yields below output. A lambda expression is used with pandas to apply the function for each row.


# Output:
# After filtering the rows based on condition:
   Courses    Fee Duration  Discount
0    Spark  22000   30days      1000
1  PySpark  25000   50days      2300

8. Other Examples using df[] and loc[]


# Other examples you can try to query rows
df[df["Courses"] == 'Spark'] 
df.loc[df['Courses'] == value]
df.loc[df['Courses'] != 'Spark']
df.loc[df['Courses'].isin(values)]
df.loc[~df['Courses'].isin(values)]
df.loc[(df['Discount'] >= 1000) & (df['Discount'] <= 2000)]
df.loc[(df['Discount'] >= 1200) & (df['Fee'] >= 23000 )]

# Select based on value contains
print(df[df['Courses'].str.contains("Spark")])

# Select after converting values
print(df[df['Courses'].str.lower().str.contains("spark")])

# Select startswith
print(df[df['Courses'].str.startswith("P")])

Frequently Asked Questions on Pandas DataFrame Query Examples

How do I filter rows based on a single condition in a Pandas DataFrame?

You can use the query() method to filter the based on a single condition. For example, query_result = df.query("column == 'value'")

How can I filter rows based on multiple conditions?

You can use the query() method to filter the based multiple conditions. For example, query_result = df.query("`column name1` >= 23000 and `Column name2` <= 24000")

How do I use the query() method for DataFrame querying?

In Pandas, the query() method is used for DataFrame querying based on a boolean expression. This method allows you to filter rows from a DataFrame that meet a specified condition.

Conclusion

In this article, I have explained multiple examples of how to query Pandas DataFrame Rows based on single and multiple conditions, from a list of values (checking column value exists in the list of string values), etc. Remember when you query DataFrame Rows, it always returns a new DataFrame with selected rows, in order to update existing df you have to use inplace=True. I hope this article helps you learn about Pandas.

Happy Learning !!

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply

This Post Has One Comment

  1. Anonymous

    Hi there, I liked this article. One point…

    Perhaps the implementation has changed, but I don’t think the pandas query likes spaces in the column names. I had a column named “Return Date” and I had to change it to “Return_Date” to avoid getting a key error.