pandas.DataFrame.where() function is similar to if-then/if else that is used to check the one or multiple conditions of an expression in DataFrame and replace with another value when the condition becomes False. By default, it replaces with NaN value and provides a param to replace with any custom value.
Note that where() method replaces all column values of a DataFrame row when the condition becomes False, other rows are untouched.
1. DataFrame.where() Syntax
Following is the syntax of the DataFrame.where()
function. This function takes parameters cond
, other
, inplace
, axis
, level
, errors
, try_cast
. Returns DataFrame. When inplace=True
used, it returns None. When a caller is Series, it returns the Series object.
# DataFrame.where() syntax
DataFrame.where(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=NoDefault.no_default)
Parameters Explanation:
- cond – Accepts one or more conditions. Takes eithere bool Series/DataFrame, array-like, or callable
- other – Value to replace with. Takes either scalar, Series/DataFrame, or callable
- inplace – If True, it updates the existing DataFrame and return None. Default set to False.
- axis – rows/columns axis. Takes int, default None
- level – Alignment leve.
- errors – Specify whether to raise any error in event of failure. Accepts str, {‘raise’, ‘ignore’}, default ‘raise’
- try_cast – Cast teh result basck to input type. (Available since 1.3.0 version)
Let’s create a DataFrame and explore the where() function usage with examples.
# Create DataFrame
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Spark","Python","PySpark"],
'Fee' :[22000,25000,23000,24000,26000],
'Discount':[1500,1000,1200,800,1300],
'Duration':['30days','50days','30days','35days','40days']
}
df = pd.DataFrame(technologies)
print(df)
Yields below output.
# Output:
Courses Fee Discount Duration
0 Spark 22000 1500 30days
1 PySpark 25000 1000 50days
2 Spark 23000 1200 30days
3 Python 24000 800 35days
4 PySpark 26000 1300 40days
2. pandas where() Example
In pandas where() function behaves differently than SQL where clause, here it is used similar to if then/if else. It checks one or multiple conditions specified with cond
param and replace with a other
value when condition becomes False
.
# Default example
df2=df.where(df.Fee > 23000)
print(df2)
Yields below output. Note that by defualt it replaces with numpy.NaN
. You can drop rows with NaN using DataFrame.dropna() function.
# Output:
Courses Fee Discount Duration
0 NaN NaN NaN NaN
1 PySpark 25000.0 1000.0 50days
2 NaN NaN NaN NaN
3 Python 24000.0 800.0 35days
4 PySpark 26000.0 1300.0 40days
3. Update with Another Value
Now, let’s update with a custom value. The below example updates all rows of DataFrame with value ‘NA’ when condition Fee > 23000
becomes False.
# Use other param
df2=df.where(df.Fee > 23000,'NA')
print(df2)
Yields below output. Notice that rows that didn’t satisfy the condition are changed to ‘NA’.
# Output:
Courses Fee Discount Duration
0 NA NA NA NA
1 PySpark 25000 1000 50days
2 NA NA NA NA
3 Python 24000 800 35days
4 PySpark 26000 1300 40days
4. where() with Multiple Columns & Conditions
The below example demonstrates how to specify the multiple conditions by using a conditional operator.
# Where on multiple columns & conditions
cond1 = df.Fee > 23000
cond2 = df.Discount > 900
df2 = df.where(cond1 & cond2, other='NA')
print(df2)
Yields below output.
# Output:
Courses Fee Discount Duration
0 NA NA NA NA
1 PySpark 25000 1000 50days
2 NA NA NA NA
3 NA NA NA NA
4 PySpark 26000 1300 40days
5. Update on Existing DataFrame
All the above examples return a new DataFrame object instead of updating the existing one. In order to update in place use inplace=True
param. When used this param with the value true, where()
function returns None
.
# Updating on existing DataFrame
df.where(cond1 & cond2, 'NA', inplace=True)
print(df)
This updates the existing DataFrame object. Yields same output as above.
Conclusion
In this article, you have learned the syntax of the DataFrame.where() function, explanation of the param, and how to use the function with examples. Note that this cannot be used similar to SQL where clause instead it should be used as if-then or if-else way.
Related Articles
- Pandas Create Conditional Column in DataFrame
- Pandas Replace Values based on Condition
- Pandas Get Index from DataFrame?
- Pandas Series filter() Function
- Get Unique Rows in Pandas DataFrame
- Convert Pandas DataFrame to Series
- Working with MultiIndex in pandas DataFrame
- How to Append Row to pandas DataFrame