pandas.DataFrame.where() Examples

pandas.DataFrame.where() function is similar to if-then/if else that is used to check the one or multiple conditions of an expression in DataFrame and replace with another value when the condition becomes False. By default, it replaces with NaN value and provides a param to replace with any custom value.

Note that where() method replaces all column values of a DataFrame row when the condition becomes False, other rows are untouched.

1. DataFrame.where() Syntax

Following is the syntax of the DataFrame.where() function. This function takes parameters cond, other, inplace, axis, level, errors, try_cast. Returns DataFrame. When inplace=True used, it returns None. When a caller is Series, it returns the Series object.


# DataFrame.where() syntax
DataFrame.where(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=NoDefault.no_default)

Parameters Explanation:

  • cond – Accepts one or more conditions. Takes eithere bool Series/DataFrame, array-like, or callable
  • other – Value to replace with. Takes either scalar, Series/DataFrame, or callable
  • inplace – If True, it updates the existing DataFrame and return None. Default set to False.
  • axis – rows/columns axis. Takes int, default None
  • level – Alignment leve.
  • errors – Specify whether to raise any error in event of failure. Accepts str, {‘raise’, ‘ignore’}, default ‘raise’
  • try_cast – Cast teh result basck to input type. (Available since 1.3.0 version)

Let’s create a DataFrame and explore the where() function usage with examples.


# Create DataFrame
import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Spark","Python","PySpark"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Discount':[1500,1000,1200,800,1300],
    'Duration':['30days','50days','30days','35days','40days']
          }
df = pd.DataFrame(technologies)
print(df)

Yields below output.


# Output:
   Courses    Fee  Discount Duration
0    Spark  22000      1500   30days
1  PySpark  25000      1000   50days
2    Spark  23000      1200   30days
3   Python  24000       800   35days
4  PySpark  26000      1300   40days

2. pandas where() Example

In pandas where() function behaves differently than SQL where clause, here it is used similar to if then/if else. It checks one or multiple conditions specified with cond param and replace with a other value when condition becomes False.


# Default example
df2=df.where(df.Fee > 23000)
print(df2)

Yields below output. Note that by defualt it replaces with numpy.NaN. You can drop rows with NaN using DataFrame.dropna() function.


# Output:
   Courses      Fee  Discount Duration
0      NaN      NaN       NaN      NaN
1  PySpark  25000.0    1000.0   50days
2      NaN      NaN       NaN      NaN
3   Python  24000.0     800.0   35days
4  PySpark  26000.0    1300.0   40days

3. Update with Another Value

Now, let’s update with a custom value. The below example updates all rows of DataFrame with value ‘NA’ when condition Fee > 23000 becomes False.


# Use other param
df2=df.where(df.Fee > 23000,'NA')
print(df2)

Yields below output. Notice that rows that didn’t satisfy the condition are changed to ‘NA’.


# Output:
   Courses    Fee Discount Duration
0       NA     NA       NA       NA
1  PySpark  25000     1000   50days
2       NA     NA       NA       NA
3   Python  24000      800   35days
4  PySpark  26000     1300   40days

4. where() with Multiple Columns & Conditions

The below example demonstrates how to specify the multiple conditions by using a conditional operator.


# Where on multiple columns & conditions
cond1 = df.Fee > 23000
cond2 = df.Discount > 900
df2 = df.where(cond1 & cond2, other='NA') 
print(df2)

Yields below output.


# Output:
   Courses    Fee Discount Duration
0       NA     NA       NA       NA
1  PySpark  25000     1000   50days
2       NA     NA       NA       NA
3       NA     NA       NA       NA
4  PySpark  26000     1300   40days

5. Update on Existing DataFrame

All the above examples return a new DataFrame object instead of updating the existing one. In order to update in place use inplace=True param. When used this param with the value true, where() function returns None.


# Updating on existing DataFrame
df.where(cond1 & cond2, 'NA', inplace=True) 
print(df)

This updates the existing DataFrame object. Yields same output as above.

Conclusion

In this article, you have learned the syntax of the DataFrame.where() function, explanation of the param, and how to use the function with examples. Note that this cannot be used similar to SQL where clause instead it should be used as if-then or if-else way.

References

Naveen (NNK)

I am Naveen (NNK) working as a Principal Engineer. I am a seasoned Apache Spark Engineer with a passion for harnessing the power of big data and distributed computing to drive innovation and deliver data-driven insights. I love to design, optimize, and managing Apache Spark-based solutions that transform raw data into actionable intelligence. I am also passion about sharing my knowledge in Apache Spark, Hive, PySpark, R etc.

Leave a Reply

You are currently viewing pandas.DataFrame.where() Examples