In pandas, the clip()
method is used to trim values at specified boundaries. This method is particularly useful for limiting the values in a DataFrame to a specified range, helping to handle outliers or to normalize data within a certain range.
In this article, I will explain the Pandas DataFrame clip()
method by using its syntax, parameters, and usage, and how to return a DataFrame where values outside the specified threshold limits are replaced.
Key Points –
- The
clip()
method is used to limit the values in a DataFrame to a specified lower and upper boundary, effectively capping values within the provided range. - It takes
lower
andupper
parameters to define the minimum and maximum threshold values. These can be specified as scalars or array-like values. - The
inplace
parameter, when set toTrue
, allows the operation to be performed directly on the original DataFrame, modifying it without creating a new DataFrame. - The
axis
parameter can be used to apply the clipping operation along a specific axis (0
orindex
for rows,1
orcolumns
for columns), especially useful whenlower
andupper
are array-like.
Syntax of Pandas DataFrame clip() Method
Following is the syntax of the pandas DataFrame.clip() function.
# Syntax of DataFrame.clip() function
DataFrame.clip(lower=None, upper=None, axis=None, inplace=False, *args, **kwargs)
Parameters of the clip()
Following are the parameters of the clip() method
lower
–float
orarray-like
, optional. Minimum threshold value. Values below this will be replaced with this value.upper
–float
orarray-like
, optional. Maximum threshold value. Values above this will be replaced with this value.axis
–{0 or 'index', 1 or 'columns'}
, defaultNone
. Align with the axis if using array-likelower
orupper
thresholds.inplace
–bool
, defaultFalse
. IfTrue
, performs operation in-place, and returnsNone
.*args
– positional arguments, optional. Additional arguments passed to the function (currently unused).kwargs
– keyword arguments, optional. Additional keyword arguments passed to the function (currently unused).
Return Value
It returns a DataFrame with values outside the threshold values replaced.
Usage of Pandas DataFrame clip() Method
The pandas.DataFrame.clip()
method is used to constrain the values in a DataFrame to fall within a specified range, defined by lower and upper bounds. This is useful for handling outliers or ensuring that values remain within certain limits.
Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are A
and B
.
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({
'A': [2, -5, 3, -8, 14],
'B': [4, -6, 0, 9, -3]
})
print("Original DataFrame:\n",df)
Yields below output.
To use the clip()
method with an upper threshold, you can specify the upper
parameter. Here, you can see that values in columns A
and B
that were greater than 10 have been replaced with 10.
# Apply the clip method with the upper threshold
df2 = df.clip(upper=10)
print("Clipped DataFrame (Upper Threshold 10):\n", df2)
# Clip values above 10
upper_threshold = 10
df2 = df.clip(upper=upper_threshold)
print("Clipped DataFrame (Upper Threshold 10):\n",df2)
Yields below output.
Use DataFrame.clip() Method with a Lower Threshold
Alternatively, to use the DataFrame.clip()
method with a lower threshold, you can specify the lower
parameter. This will replace any values in the DataFrame that are less than the specified lower threshold with that threshold value.
# Clip values below 1
df2 = df.clip(lower=1)
print("Clipped DataFrame (Lower Threshold 1):\n", df2)
# Output:
# Clipped DataFrame (Lower Threshold 1):
# A B
# 0 2 4
# 1 1 1
# 2 3 1
# 3 1 9
# 4 14 1
Here, you can see that values in columns A
and B
that were less than 1 have been replaced with 1.
Use DataFrame.clip() Method with a Lower and Upper Threshold
You can use the DataFrame.clip()
method with both a lower and an upper threshold, you can specify both the lower
and upper
parameters. This will replace any values in the DataFrame that are less than the lower threshold with the lower threshold value and any values that are greater than the upper threshold with the upper threshold value.
# Clip values below 2 and above 8
df2 = df.clip(lower=2, upper=8)
print("Clipped DataFrame (Lower Threshold 2 and Upper Threshold 8):\n", df2)
# Output:
# Clipped DataFrame (Lower Threshold 2 and Upper Threshold 8):
# A B
# 0 2 4
# 1 2 2
# 2 3 2
# 3 2 8
# 4 8 2
Here, you can see that values in columns A
and B
that were less than 2 have been replaced with 2, and values that were greater than 8 have been replaced with 8.
In-Place Clipping with Lower and Upper Thresholds
To perform in-place clipping on a DataFrame with both lower and upper thresholds, you can use the clip()
method with the lower
, upper
, and inplace
parameters set. This will modify the original DataFrame directly without creating a new one.
# Apply the clip method in-place with both thresholds
df.clip(lower=2, upper=8, inplace=True)
print("In-Place Clipped DataFrame:\n", df)
# Output:
# In-Place Clipped DataFrame:
# A B
# 0 2 4
# 1 2 2
# 2 3 2
# 3 2 8
# 4 8 2
Here, the values in columns A
and B
that were less than 2 have been replaced with 2, and values greater than 8 have been replaced with 8 in the original DataFrame df
.
Clipping with NaN Values
Similarly, when clipping a DataFrame that contains NaN
values, the clip()
method will leave NaN
values unchanged. This is because NaN
is not considered in the comparison operations performed by clip
.
import pandas as pd
import numpy as np
# Sample DataFrame with NaN values
df = pd.DataFrame({
'A': [2, -5, np.nan, -8, 14],
'B': [4, -6, 0, np.nan, -3]
})
# Clip values below 0 and above 10
df2 = df.clip(lower=0, upper=10)
print("Clipped DataFrame with NaN values:\n",df2)
# Output:
# Clipped DataFrame with NaN values:
# A B
# 0 2.0 4.0
# 1 0.0 0.0
# 2 NaN 0.0
# 3 0.0 NaN
# 4 10.0 0.0
Frequently Asked Questions on Pandas DataFrame clip() Method
The clip()
method is used to limit the values in a DataFrame to be within a specified range, defined by lower and upper boundaries. Values below the lower boundary are set to the lower boundary, and values above the upper boundary are set to the upper boundary.
You can use array-like values or Series for lower
and upper
parameters to specify different thresholds for different columns or rows.
NaN values are not affected by clipping. They remain unchanged in the DataFrame after clipping.
By setting the axis
parameter. For example, to clip values along rows, use axis=0
, and to clip values along columns, use axis=1
.
If inplace=True
, the method modifies the original DataFrame directly and does not return a new DataFrame.
Conclusion
In this article, I have explained the Pandas DataFrame clip()
function by using its syntax, parameters, usage, and how to return a DataFrame of the same type as the calling object, with values outside the clipping boundaries replaced, or None
if inplace=True
is specified.
Happy Learning!!
Related Articles
- Pandas DataFrame nunique() Method
- Pandas DataFrame tail() Method
- Pandas DataFrame pivot() Method
- Pandas DataFrame explode() Method
- Pandas DataFrame sum() Method
- Pandas DataFrame shift() Function
- Pandas DataFrame info() Function
- Pandas DataFrame head() Method
- Pandas DataFrame sample() Function
- Pandas DataFrame describe() Method