In Pandas, the where()
function is used to replace values with specified values where the condition is not satisfied. It is a convenient method for filtering data based on a condition. This function is used for the conditional replacement of values. It provides a flexible way to apply conditions to each element of a Series and replace values that do not meet the condition with a specified value or another Series.
In this article, I will explain the Pandas Series where()
function and using its syntax, parameters, and usage; and explain how to replace the values with specified values within a Series based on certain conditions.
Key Points –
- The
Series.where()
method in Pandas is used for conditional filtering of data within a Series. It allows you to retain the original values where a specified condition is met and replace values where the condition is not met with a specified substitute. - It helps in selectively retaining original values where the condition is true and replacing values where the condition is false with a specified substitute.
- You can specify a scalar value, another Series, or a callable function as the replacement, offering a versatile way to customize the substitution based on your data and conditions.
- By default,
where()
returns a new Series with the specified replacements, leaving the original Series unchanged. If you set theinplace
parameter toTrue
, the original Series is modified in place, andNone
is returned. - Similar to NumPy’s broadcasting, the
where()
method in Pandas that supports broadcasting, allowing you to use it with conditions and replacement values of different shapes. The operation is performed element-wise, and the shapes are aligned based on the broadcasting rules.
Series where() Introduction
Following is the syntax of series where().
# Syntax of series where()
Series.where(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)
Parameters of Series where()
Following are the parameters of the series where() function.
cond
– This is the condition to be applied. The values for which the condition is False will be replaced with corresponding values fromother
.other
– The replacement values for elements where the condition is False. By default, it is set tonan
(Not a Number).inplace
– If True, the operation will modify the Series in place and will returnNone
. If False (default), it will return a new Series with the values modified.axis
– Not applicable for Series. It’s present for compatibility with DataFrame. Should be None.level
– If the axis is a MultiIndex, the level to use for alignment.errors
– Defines behavior when the condition contains errors. The default is ‘raise’, which raises an error. You can also use ‘coerce’ to set invalid elements to NaN.try_cast
– If True and the condition dtype is not the same as the dtype of the Series, try to cast the condition to the Series dtype.
Return value
It returns a new Series with updated values along with original values, which are dependent upon the condition.
Create Pandas Series
You can create a Pandas Series using a Python list & dictionaries, below example creates a Series from a list. To use Pandas first, you need to import using import pandas as pd
.
# Create a Pandas Series
import pandas as pd
import numpy as np
data = pd.Series([1, 5, 10, 15, 20])
series = pd.Series(data)
print("Original Series:\n",series)
Yields below output.
Using Pandas where() to Replace Series Values with NaN
You can use the Pandas where() function to replace the values in a Series with NaN values where the condition is not satisfied. If the condition is satisfied the values remain unchanged. For example,
# Replace values with NaN using where()
result = data.where(data >= 10, np.nan)
print(result)
Here, the where
method is applied to the data
series, it replaces values where the condition is False with a specified value (np.nan
in this case). The condition is data >= 10
, so all values in the series that are less than 10 will be replaced with np.nan
.
Yields the below output.
As you can see, values less than 10 in the original series are replaced with NaN
in the result
series.
Replace Values using where() with a Specific Value
To replace values based on a condition with a specific value using the where()
function. For example, apply the where() function to a given Series then pass the specified condition along with the specified value into this function. it will replace the values where the condition becomes False; otherwise, it will retain the original values.
# Replace values with 100 using where()
result = data.where(data <= 15, 100)
print(result)
# Output:
# 0 1
# 1 5
# 2 10
# 3 15
# 4 100
# dtype: int64
From the above code, values in the data
Series that are less than or equal to 15 will remain unchanged, while values greater than 15 will be replaced with 100.
Use where() with Multiple Conditions
You can replace values based on multiple conditions using the where()
function, you can chain conditions together.
# Replace values based on multiple conditions
result = data.where((data >= 10) & (data <= 20), 0).where(data > 20, 100)
print(result)
# Output:
# 0 100
# 1 100
# 2 100
# 3 100
# 4 100
# dtype: int64
In the above example, the where()
function is used to replace values less than 10 with 0 and values between 10 and 20 (inclusive) remain unchanged. This is achieved using the condition (data >= 10) & (data <= 20)
. If the condition is True, the original value is kept; otherwise, it is replaced with 0. The second where()
function is then used to replace values greater than 20 with 100. If the condition data > 20
is True, the original value is kept; otherwise, it is replaced with 100. The final result is a new Series (result
) with replaced values according to the specified conditions.
Pandas Series where() with Lambda function
Alternatively, you can use the where()
function to replace values by using a lambda function. You can provide a callable (a function) as the other
parameter.
# Replace values using where() and lambda
result = data.where(lambda x: x % 2 == 0, other=data**2)
print(result)
# Output:
# 0 1
# 1 25
# 2 10
# 3 225
# 4 20
# dtype: int64
In the above example, the lambda function is used to check if each value in the data
Series is even (x % 2 == 0)
. If the condition is True, the original value is kept; otherwise, the value is replaced with its square (data**2
). As a result, only the odd values are replaced with their squares.
Replace Values using Another Series
Similarly, you can replace values in a Pandas Series with another Series using where()
function. First, create two Series and then apply the where() function to the first Series and pass another Series as a replacement. It will replace the values of the first Series with the corresponding values of the second Series where the condition becomes False; otherwise, the values of the first Series remain unchanged.
# Create Pandas Series
import pandas as pd
import numpy as np
data = pd.Series([1, 2, 3, 4, 5])
data1 = pd.Series([10, 20, 30, 40, 50])
# Replace values using another series
result = data.where(data < 3, data1)
print(result)
# Output:
# 0 1
# 1 2
# 2 30
# 3 40
# 4 50
# dtype: int64
In the above example, the where()
function is used to replace values in data
with the corresponding values in data1
where the condition data < 3
is True. If the condition is False, the values from data1
are used. As a result, values in data
less than 3 are replaced with values from data1
.
Frequently Asked Questions on Pandas Series where() Function
The where()
function is used to replace values in a Series based on a specified condition. It allows for conditional replacement, where values meeting the condition remain unchanged, and others can be replaced with a specified value or another Series.
The where()
function works by evaluating a condition on each element of the Series. If the condition is True, the original value is retained; otherwise, it can be replaced with a specified value or the corresponding value from another Series.
One common use of the where()
function is to replace values with NaN. You can achieve this by specifying np.nan
as the replacement value for the values that do not meet the specified condition.
You can replace values based on multiple conditions using the where()
function by combining these conditions using logical operators like &
(and), |
(or), and ~
(not).
It is possible to replace values with a function using the where()
function in Pandas. You can provide a callable (such as a function or a lambda function) as the other
parameter in the where()
function. This allows you to dynamically compute replacement values based on the condition.
You can replace values in a Pandas Series with corresponding values from another Series using the where()
function.
Conclusion
In this article, I have explained the where()
function in Pandas is a powerful tool for conditionally replacing values in a Series based on specified conditions. Whether replacing values with a constant, using a function, or replacing values with another Series, where()
allows for flexible and efficient data manipulation.
Happy Learning !!
Related Articles
- Pandas Series.mean() Function
- Convert Pandas Series to String
- Pandas Get Floor or Ceil of Series
- How to Rename a Pandas Series
- Pandas.Series.combine() function
- Pandas Series.diff() Function
- Pandas Iterate Over Series
- Pandas Series concat() Function
- Pandas Series.isin() Function
- Pandas Series.diff() Function
- Convert Pandas Series to DataFrame
- Pandas Series astype() Function
- Pandas Series sum() Function
- Pandas Series.shift() Function
- Pandas Series.quantile() Function
- How To Get Value From Pandas Series?