• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:16 mins read
You are currently viewing pandas.DataFrame.where() Examples

pandas.DataFrame.where() function is similar to if-then/if else that is used to check the one or multiple conditions of an expression in DataFrame and replace with another value when the condition becomes False. By default, it replaces with NaN value and provides a param to replace with any custom value.

Note that where() method replaces all column values of a DataFrame row when the condition becomes False, other rows are untouched.

1. DataFrame.where() Syntax

Following is the syntax of the DataFrame.where() function. This function takes parameters cond, other, inplace, axis, level, errors, try_cast. Returns DataFrame. When inplace=True used, it returns None. When a caller is Series, it returns the Series object.


# DataFrame.where() syntax
DataFrame.where(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=NoDefault.no_default)

Parameters Explanation:

  • cond – Accepts one or more conditions. Takes eithere bool Series/DataFrame, array-like, or callable
  • other – Value to replace with. Takes either scalar, Series/DataFrame, or callable
  • inplace – If True, it updates the existing DataFrame and return None. Default set to False.
  • axis – The axis along which to apply the condition. By default, it is None, which means the condition is applied to both rows and columns. Takes int, default None
  • level – For DataFrames with multi-level index, this specifies the level(s) where the condition should be applied. Alignment leve.
  • errors – Specify whether to raise any error in event of failure. Accepts str, {‘raise’, ‘ignore’}, default ‘raise’
  • try_cast – If True, try to cast the result to the original data type of the DataFrame. The default is False. Cast teh result basck to input type. (Available since 1.3.0 version)

Let’s create a DataFrame and explore the where() function usage with examples.


# Create DataFrame
import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Spark","Python","PySpark"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Discount':[1500,1000,1200,800,1300],
    'Duration':['30days','50days','30days','35days','40days']
          }
df = pd.DataFrame(technologies)
print("Create DataFrame:\n",df)

Yields below output.

pandas where

2. Pandas where() Example

In pandas where() function behaves differently than SQL where clause, here it is used similar to if then/if else. It checks one or multiple conditions specified with cond param and replace with a other value when condition becomes False.


# Default example
df2=df.where(df.Fee > 23000)
print(df2)

Yields below output. Note that by defualt it replaces with numpy.NaN. You can drop rows with NaN using DataFrame.dropna() function.

pandas where

3. Update with Another Value

Now, let’s update with a custom value. The below example updates all rows of DataFrame with value ‘NA’ when condition Fee > 23000 becomes False.


# Use other param
df2=df.where(df.Fee > 23000,'NA')
print(df2)

Yields below output. Notice that rows that didn’t satisfy the condition are changed to ‘NA’.


# Output:
   Courses    Fee Discount Duration
0       NA     NA       NA       NA
1  PySpark  25000     1000   50days
2       NA     NA       NA       NA
3   Python  24000      800   35days
4  PySpark  26000     1300   40days

4. where() with Multiple Columns & Conditions

To create a DataFrame called df and then using the where() method with multiple conditions to replace values where both conditions are not met with ‘NA’. For instance, values in the ‘Courses’, ‘Fee’, ‘Discount’, and ‘Duration’ columns where both conditions (Fee > 23000 and Discount > 900) are not met are replaced with ‘NA’.


# Where on multiple columns & conditions
cond1 = df.Fee > 23000
cond2 = df.Discount > 900
df2 = df.where(cond1 & cond2, other='NA') 
print(df2)

Yields below output.


# Output:
   Courses    Fee Discount Duration
0       NA     NA       NA       NA
1  PySpark  25000     1000   50days
2       NA     NA       NA       NA
3       NA     NA       NA       NA
4  PySpark  26000     1300   40days

5. Update on Existing DataFrame

All the above examples return a new DataFrame object instead of updating the existing one. In order to update in place use inplace=True param. When used this param with the value true, where() function returns None.


# Updating on existing DataFrame
df.where(cond1 & cond2, 'NA', inplace=True) 
print(df)

This updates the existing DataFrame object. Yields the same output as above.

Frequently Asked Questions on Pandas DataFrame where() Function

What is the purpose of pandas.DataFrame.where()?

The pandas.DataFrame.where() method serves the purpose of selectively replacing values in a DataFrame based on a specified condition. It allows you to apply a condition to each element in the DataFrame, and when the condition is not met, the corresponding element is replaced with a specified value.

How can I use pandas.DataFrame.where() with a single condition?

You can use pandas.DataFrame.where() with a single condition by specifying the condition and, optionally, providing a value to replace the elements where the condition is not met.

Can I use pandas.DataFrame.where() with multiple conditions?

You can use pandas.DataFrame.where() with multiple conditions by combining the conditions using logical operators like & (AND) and | (OR).

What happens if I don’t specify the other parameter in pandas.DataFrame.where()?

If you don’t specify the other parameter in pandas.DataFrame.where(), the default behavior is that the values where the condition is not met will be replaced with NaN (Not a Number).

Can I modify the original DataFrame in place using pandas.DataFrame.where()?

You can modify the original DataFrame in place using the inplace parameter in pandas.DataFrame.where(). When inplace is set to True, the original DataFrame is modified, and the method returns None. If inplace is set to False (default behavior), a new DataFrame with the modifications is returned.

Can I modify the original DataFrame in place using pandas.DataFrame.where()?

You can modify the original DataFrame in place using the inplace parameter of the pandas.DataFrame.where() method. When you set inplace=True, the original DataFrame will be modified, and the method will return None. If inplace=False (or not specified, as it is by default), a new DataFrame with the modifications will be returned.

Conclusion

In this article, you have learned the syntax of the DataFrame.where() function, an explanation of the param, and how to use the function with examples. Note that this cannot be used similarly to SQL where clause instead it should be used as if-then or if-else way.

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium