In pandas, the mask()
method is used to replace values in a DataFrame or Series where a specified condition is True. It essentially allows you to mask or hide specific data based on a condition and replace it with another value.
In this article, I will explain the Pandas DataFrame mask()
method, covering its syntax, parameters, and usage. I will also demonstrate how to use it to replace values based on a condition, which is a common application for masking specific elements in a DataFrame or Series.
Key Points –
- The
mask()
method replaces values in a DataFrame based on a specified condition, where the condition is true. - By default,
mask()
replaces the values where the condition is met withNaN
. - You can specify a custom value to replace the values where the condition is true, instead of the default
NaN
. - The
mask()
method has aninplace
parameter which, if set to True, modifies the DataFrame in place without creating a new object. - The
mask()
method allows specification of the axis along which to apply the mask and supports MultiIndex by using thelevel
parameter for applying the condition.
Pandas DataFrame mask() Introduction
Let’s know the syntax of the mask() method.
# Syntax of DataFrame mask()
DataFrame.mask(cond, other=nan, inplace=False, axis=None, level=None, errors='raise')
Parameters of the DataFrame mask()
Following are the parameters of the DataFrame mask() method.
cond
– A boolean condition or DataFrame of the same shape as the DataFrame being masked. Values in the DataFrame where the condition isTrue
will be replaced.other
– The value to replace entries that match the condition. By default, this isNaN
.inplace
– IfTrue
, performs the operation in place and returnsNone
. Defaults toFalse
.axis
– The axis to align the condition with (0 for rows, 1 for columns). Defaults toNone
.level
– For MultiIndex DataFrames, specifies the level at which to perform the operation. Defaults toNone
.errors
– Controls whether to raise errors if the conditions cannot be applied. Defaults toraise
.
Return Value
It returns a DataFrame with values replaced according to the condition specified.
Usage of Pandas DataFrame mask() Method
The mask()
method in pandas is used for the conditional replacement of values in a DataFrame.
Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are A
, and B
.
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [5, 8, 13, 15],
'B': [2, 12, 4, 7]
})
print("Original DataFrame:\n",df)
Yields below output
Here is an example demonstrating the basic usage of the mask()
method in Pandas to replace values in a DataFrame based on a specified condition.
# Mask values greater than 10
df2 = df.mask(df > 10)
print("Masked DataFrame (values > 10 replaced with NaN):\n", df2)
In the above example, any values greater than 10 in the DataFrame are replaced with NaN
.
Custom Replacement Value
Alternatively, to use a custom replacement value with the mask()
method, you can specify the other
parameter.
# Mask values greater than 10 and replace with 0
df2 = df.mask(df > 10, other=0)
print("Masked values > 10 replaced with 0:\n", df2)
# Output:
# Masked values > 10 replaced with 0:
# A B
# 0 5 2
# 1 8 0
# 2 0 4
# 3 0 7
In the above example, any values in the DataFrame greater than 10 are replaced with 0
instead of NaN
.
Using inplace Parameter
To use the inplace
parameter with the mask()
method, you can modify the original DataFrame directly without creating a new one. Setting inplace=True
applies the mask and updates the DataFrame in place.
Here’s how you can use the inplace
parameter to mask values greater than 10 and replace them with -6
.
# Mask values greater than 10 and replace with -6, modifying in place
df.mask(df > 10, other=-6, inplace=True)
print("Modified DataFrame (values > 10 replaced with -6):\n", df)
# Output:
# Modified DataFrame (values > 10 replaced with -6):
# A B
# 0 5 2
# 1 8 -6
# 2 -6 4
# 3 -6 7
In the above example, the inplace=True
parameter modifies the original DataFrame df
directly, replacing any values greater than 10 with -6
.
Applying Mask on Specific Axis
Similarly, to apply the mask()
method on a specific axis of a DataFrame, you can use the axis
parameter. This parameter allows you to specify whether the mask should be applied along rows (axis=0) or columns (axis=1).
Masking Along Columns (axis=1)
You can use the mask()
method in Pandas to replace values greater than 5 across columns (axis=1) with NaN
.
# Mask values greater than 5 along columns axis
df2 = df.mask(df > 5, axis=1)
print("Masked values > 5 replaced with NaN along columns:\n", df2)
# Output:
# Masked values > 5 replaced with NaN along columns:
# A B
# 0 5.0 2.0
# 1 NaN NaN
# 2 NaN 4.0
# 3 NaN NaN
In the above example, values greater than 5 are replaced with NaN
along the columns axis, meaning each value is checked and replaced based on its column.
Masking Along Rows (axis=0)
You can also use the mask()
method in Pandas to replace values greater than 5 across rows (axis=0) with NaN
.
# Mask values greater than 5 along rows axis
df2 = df.mask(df > 5, axis=0)
print("Masked values > 5 replaced with NaN along rows:\n", df2)
# Output:
# Masked values > 5 replaced with NaN along rows:
# A B
# 0 5.0 2.0
# 1 NaN NaN
# 2 NaN 4.0
# 3 NaN NaN
In the above example, values greater than 5 are replaced with NaN
along the rows axis, meaning each value is checked and replaced based on its row.
Using a Callable Condition
Finally, you can also use the mask()
method with a callable condition in Pandas. For instance, a condition function can be used to identify even values in the DataFrame. The mask()
method will then replace those even values with NaN
.
# Define a callable condition
def condition(x):
return x % 2 == 0
# Mask values based on the callable condition
df2 = df.mask(condition)
print("Masked DataFrame (even values replaced with NaN):\n", df2)
# Output:
# Masked DataFrame (even values replaced with NaN):
# A B
# 0 5.0 NaN
# 1 NaN NaN
# 2 13.0 NaN
# 3 15.0 7.0
Frequently Asked Questions on Pandas DataFrame mask() Method
The mask()
method in Pandas is used to replace values in a DataFrame or Series where a specified condition is true. It allows for conditional data replacement, enabling you to mask or hide certain values based on the condition.
You can replace masked values with a custom value instead of NaN
by using the other
parameter in the mask()
method.
You can specify the axis along which to apply the mask()
method using the axis
parameter.
The mask()
method supports MultiIndex DataFrames and allows specifying the level
parameter to apply the condition on a particular level of the index.
The mask()
method replaces values where the condition is True
, whereas the where()
method replaces values where the condition is False
. They are essentially complementary methods used for conditional replacement in Pandas.
Conclusion
In conclusion, the mask()
method in Pandas is a versatile and powerful tool for conditional data manipulation, allowing you to replace values in a DataFrame or Series based on specific conditions. You can replace values with NaN
by default or specify a custom replacement value using the other
parameter. Additionally, you can modify the DataFrame in place with the inplace
parameter, and you can apply conditions to specific axes or use callable conditions for more complex scenarios.
Happy Learning!!
Related Articles
- Pandas DataFrame rank() Method
- Pandas DataFrame mode() Method
- Pandas DataFrame mad() Method
- Pandas DataFrame corr() Method
- Pandas DataFrame copy() Function
- Pandas DataFrame median() Method
- Pandas DataFrame div() Function
- Pandas DataFrame equals() Method
- Pandas DataFrame assign() Method
- Pandas DataFrame corrwith() Method