pandas.DataFrame.where()
function is similar to if-then/if else that is used to check the one or multiple conditions of an expression in DataFrame and replace with another value when the condition becomes False. By default, it replaces with NaN value and provides a param to replace with any custom value.
Note that where()
method replaces all column values of a DataFrame row when the condition becomes False, other rows are untouched.
Key Points –
DataFrame.where()
is used to replace elements in a DataFrame based on a condition.- It keeps the original values where the condition is
True
and replaces values where the condition isFalse
. - You can specify the replacement value using the
other
parameter. - The method supports applying conditions across entire DataFrames, Series, or single values.
where()
can work with multiple conditions when combined with logical operators.- The
inplace
parameter allows modifying the original DataFrame without creating a new one. - You can control whether
NaN
values should be included in the condition using theaxis
parameter.
1. DataFrame.where() Syntax
Following is the syntax of the DataFrame.where()
function. This function takes parameters cond
, other
, inplace
, axis
, level
, errors
, try_cast
. Returns DataFrame. When inplace=True
used, it returns None. When a caller is Series, it returns the Series object.
# DataFrame.where() syntax
DataFrame.where(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=NoDefault.no_default)
Parameters Explanation:
- cond – Accepts one or more conditions. Takes eithere bool Series/DataFrame, array-like, or callable
- other – Value to replace with. Takes either scalar, Series/DataFrame, or callable
- inplace – If True, it updates the existing DataFrame and return None. Default set to False.
- axis – The axis along which to apply the condition. By default, it is None, which means the condition is applied to both rows and columns. Takes int, default None
- level – For DataFrames with multi-level index, this specifies the level(s) where the condition should be applied. Alignment leve.
- errors – Specify whether to raise any error in event of failure. Accepts str, {‘raise’, ‘ignore’}, default ‘raise’
- try_cast – If True, try to cast the result to the original data type of the DataFrame. The default is False. Cast teh result basck to input type. (Available since 1.3.0 version)
Let’s create a DataFrame and explore the where() function usage with examples.
# Create DataFrame
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Spark","Python","PySpark"],
'Fee' :[22000,25000,23000,24000,26000],
'Discount':[1500,1000,1200,800,1300],
'Duration':['30days','50days','30days','35days','40days']
}
df = pd.DataFrame(technologies)
print("Create DataFrame:\n",df)
Yields below output.
2. Pandas where() Example
In pandas where()
function behaves differently than SQL where clause, here it is used similar to if then/if else. It checks one or multiple conditions specified with cond
param and replace with a other
value when condition becomes False
.
# Default example
df2=df.where(df.Fee > 23000)
print(df2)
Yields below output. Note that by defualt it replaces with numpy.NaN
. You can drop rows with NaN using DataFrame.dropna() function.
3. Update with Another Value
Now, let’s update with a custom value. The below example updates all rows of DataFrame with value ‘NA’ when condition Fee > 23000
becomes False.
# Use other param
df2=df.where(df.Fee > 23000,'NA')
print(df2)
Yields below output. Notice that rows that didn’t satisfy the condition are changed to ‘NA’.
# Output:
Courses Fee Discount Duration
0 NA NA NA NA
1 PySpark 25000 1000 50days
2 NA NA NA NA
3 Python 24000 800 35days
4 PySpark 26000 1300 40days
4. where() with Multiple Columns & Conditions
To create a DataFrame called df
and then using the where()
method with multiple conditions to replace values where both conditions are not met with ‘NA’. For instance, values in the ‘Courses’, ‘Fee’, ‘Discount’, and ‘Duration’ columns where both conditions (Fee > 23000
and Discount > 900
) are not met are replaced with ‘NA’.
# Where on multiple columns & conditions
cond1 = df.Fee > 23000
cond2 = df.Discount > 900
df2 = df.where(cond1 & cond2, other='NA')
print(df2)
Yields below output.
# Output:
Courses Fee Discount Duration
0 NA NA NA NA
1 PySpark 25000 1000 50days
2 NA NA NA NA
3 NA NA NA NA
4 PySpark 26000 1300 40days
5. Update on Existing DataFrame
All the above examples return a new DataFrame object instead of updating the existing one. In order to update in place use inplace=True
param. When used this param with the value true, where()
function returns None
.
# Updating on existing DataFrame
df.where(cond1 & cond2, 'NA', inplace=True)
print(df)
This updates the existing DataFrame object. Yields the same output as above.
Frequently Asked Questions on Pandas DataFrame where() Function
The pandas.DataFrame.where()
method serves the purpose of selectively replacing values in a DataFrame based on a specified condition. It allows you to apply a condition to each element in the DataFrame, and when the condition is not met, the corresponding element is replaced with a specified value.
You can use pandas.DataFrame.where()
with a single condition by specifying the condition and, optionally, providing a value to replace the elements where the condition is not met.
You can use pandas.DataFrame.where()
with multiple conditions by combining the conditions using logical operators like &
(AND) and |
(OR).
If you don’t specify the other
parameter in pandas.DataFrame.where()
, the default behavior is that the values where the condition is not met will be replaced with NaN
(Not a Number).
You can modify the original DataFrame in place using the inplace
parameter in pandas.DataFrame.where()
. When inplace
is set to True
, the original DataFrame is modified, and the method returns None
. If inplace
is set to False
(default behavior), a new DataFrame with the modifications is returned.
You can modify the original DataFrame in place using the inplace
parameter of the pandas.DataFrame.where() method. When you set inplace=True
, the original DataFrame will be modified, and the method will return None
. If inplace=False
(or not specified, as it is by default), a new DataFrame with the modifications will be returned.
Conclusion
In this article, you have learned the syntax of the DataFrame.where()
function, an explanation of the param, and how to use the function with examples. Note that this cannot be used similarly to SQL where clause instead it should be used as if-then or if-else way.
Related Articles
- Pandas Correlation of Columns
- Pandas Get Index from DataFrame?
- Pandas Series filter() Function
- Pandas DataFrame insert() Function
- Get Unique Rows in Pandas DataFrame
- Convert Pandas DataFrame to Series
- Pandas Drop Last Column From DataFrame
- Working with MultiIndex in pandas DataFrame
- How to Append Row to pandas DataFrame
- Pandas Replace Values based on Condition
- Pandas – Get Column Index For Column Name
- Pandas Create Conditional Column in DataFrame