In Pandas, the clip()
function is used to limit the values in a Series within a specified range. It’s particularly useful when you want to cap or floor the values of a Series to certain minimum and maximum values.
In this article, I will explain the Series.clip()
function and using its syntax, parameters, and usage how we can return a new Series with values clipped to the specified range, unless inplace=True
is specified, in which case it modifies the existing Series in place and returns None
.
Key Points –
- The
clip()
function in Pandas is used to limit the values in a Series within a specified range. - It provides flexibility in handling outliers or extreme values by enabling you to cap or floor them without manually iterating over the Series.
- Values below the lower bound are replaced with the lower bound, and values above the upper bound are replaced with the upper bound.
- The
inplace
parameter can be used to operate in place, modifying the original Series, if set toTrue
. - If
inplace
is not specified or set toFalse
, the function returns a new Series with clipped values.
Series clip() Introduction
Following is the syntax of the pandas Series.clip() function.
# Syntax of Series.clip() function
Series.clip(lower=None, upper=None, axis=None, inplace=False, *args, **kwargs)
Parameters of the Series.clip()
Following are the parameters of the Series.clip() function.
lower
– Scalar or array-like, optional. This parameter specifies the lower bound for the values in the Series. If a value in the Series is less than this lower bound, it will be replaced by the lower bound. If set toNone
, clipping is not performed on the lower end.upper
– Scalar or array-like, optional. This parameter specifies the upper bound for the values in the Series. If a value in the Series is greater than this upper bound, it will be replaced by the upper bound. If set toNone
, clipping is not performed on the upper end.axis
– It specifies the axis along which the clipping is applied. By default (None), the operation is applied to the entire Series. You can specifyaxis=0
oraxis=index
to clip along rows oraxis=1
oraxis=columns
to clip along columns in a DataFrame.inplace
– This is a boolean parameter. If set toTrue
, the operation is performed in place, modifying the original Series. If set toFalse
(default), a new Series with clipped values is returned.*args
,**kwargs
– Additional arguments that are passed to the method that performs the clipping.
Return Value
It returns a new Series object with the values clipped according to the specified lower and upper bounds. If the inplace
parameter is set to True
, the function will modify the original Series in place and return None
.
Clipping Values Below a Lower Bound
Clipping values below a lower bound means setting any values in the series that are less than the specified lower bound to be equal to that lower bound.
Let’s create the Series using Python lists.
import pandas as pd
# Create a sample Series
series = pd.Series([-10, 20, 30, 40, 50])
print("Original Series:\n",series)
Yields below output.
To clip values below a lower bound using the clip()
function in Pandas, you can specify the lower bound parameter (lower
) to which values below it will be set.
# Clip values below a lower bound
clipped_series = series.clip(lower=0)
print("Clipped Series (values below 0 clipped to 0):\n", clipped_series)
In the above example, the clip()
function is used to set any values in the series series
that are less than 0 to be equal to 0. As a result, the negative value -10 is replaced by 0, and all other values remain unchanged. This example yields the below output.
Clipping Values Above an Upper Bound
Clipping values above an upper bound means setting any values in the Series that are greater than the specified upper bound to be equal to the upper bound.
# Clip values above an upper bound of 30
clipped_series = series.clip(upper=30)
print("Clipped Series (values above 30 clipped to 30):\n", clipped_series)
# Output:
# Clipped Series (values above 30 clipped to 30):
# 0 -10
# 1 20
# 2 30
# 3 30
# 4 30
# dtype: int64
In the above example, the values in the Series that are greater than 30 (40
and 50
) have been clipped to 30
, which is the specified upper bound.
Clipping Values with Different Lower and Upper Bounds
Alternatively, to clip values with different lower and upper bounds using the clip()
function in Pandas, you can specify both the lower
and upper
parameters.
# Clip values within the range of 10 to 40
clipped_series = series.clip(lower=10, upper=40)
print("Clipped Series (values clipped to the range of 10 to 40):\n", clipped_series)
# Output:
# Clipped Series (values clipped to the range of 10 to 40):
# 0 10
# 1 20
# 2 30
# 3 40
# 4 40
# dtype: int64
In this example, any values below 10 in the original series have been clipped to 10, and any values above 40 have been clipped to 40 in the clipped series.
Clipping Values Inplace
You can perform inplace clipping by setting the inplace
parameter to True
when using the clip()
function. For instance, the inplace=True
parameter is used to perform the clipping operation directly on the original Series series
. As a result, the original Series is modified inplace, and the values are clipped to fall within the range of 0 to 30.
# Clip values to be within the range of 0 to 30 inplace
series.clip(lower=0, upper=30, inplace=True)
print("Clipped Series (values clipped to the range of 0 to 30):\n", series)
# Output:
# Clipped Series (values clipped to the range of 0 to 30):
# 0 0
# 1 20
# 2 30
# 3 30
# 4 30
# dtype: int64
Clipping Values with NaNs
When clipping values in a Pandas Series that contains NaNs (missing values), the NaNs are preserved in the resulting Series. Here’s how you can clip values with NaNs using the clip()
function.
import pandas as pd
import numpy as np
# Create a sample Series with NaNs
series = pd.Series([-10, 20, np.nan, 40, 50])
# Clip values to be within the range of 0 to 30, NaNs will remain unchanged
clipped_series = series.clip(lower=0, upper=30)
print("Clipped Series (values clipped to the range of 0 to 30):\n", clipped_series)
# Output:
# Clipped Series (values clipped to the range of 0 to 30):
# 0 0.0
# 1 20.0
# 2 NaN
# 3 30.0
# 4 30.0
# dtype: float64
In the above example, the NaN value in the original series remains unchanged after clipping. Any values below 0 are clipped to 0, and any values above 30 are clipped to 30 in the resulting series.
Clipping with NaN Replacement
Similarly, you can clip values in a Pandas Series while replacing NaNs (missing values) with a specified value using the clip()
function along with the fillna() function
# Clip values to be within the range of 0 to 30
# And replace NaNs with a value of -1
clipped_series = series.clip(lower=0, upper=30).fillna(-1)
print("Clipped Series (values clipped to the range of 0 to 30 with NaNs replaced):\n", clipped_series)
# Output:
# Clipped Series (values clipped to the range of 0 to 30 with NaNs replaced):
# 0 0.0
# 1 20.0
# 2 -1.0
# 3 30.0
# 4 30.0
# dtype: float64
In the above example, the clip()
function is used to clip values to be within the range of 0 to 30. Then, the fillna()
function is used to replace any NaNs with the specified value of -1. Finally, the clipped Series with NaNs replaced is printed.
Frequently Asked Questions on Pandas Series.clip() Function
The clip()
function in Pandas Series serves the purpose of limiting or “clipping” the values within a specified range. It allows you to cap or floor the values of a Series to certain minimum and maximum values. This function is particularly useful for data preprocessing tasks, handling outliers, or ensuring that values remain within a specific range for analysis or visualization purposes.
Values outside the specified range are replaced with the nearest bound. For example, if a value is below the lower bound, it is replaced with the lower bound; if it’s above the upper bound, it is replaced with the upper bound.
You can use the clip()
function in Pandas to handle outliers in your data. Outliers are data points that significantly differ from the rest of the dataset and can skew statistical analysis or machine learning models. By capping or flooring the values of a Series using the clip()
function, you can effectively mitigate the impact of outliers on your analysis.
The inplace
parameter, when set to True
, modifies the original Series in place and returns None
. If set to False
(default), it returns a new Series with clipped values.
It is possible to clip values only above or below a certain threshold by setting either the lower
or upper
parameter in the clip()
function.
NaNs are preserved in the resulting Series when using the clip()
function. They are not affected by the clipping operation unless explicitly handled through methods like fillna()
.
Conclusion
In this article, I have explained the clip()
function in Pandas; It is used to limit the values in a Series to a specified range. It ensures that all values fall within the provided lower and upper bounds.
Happy Learning!!
Related Articles
- Pandas Series where() Function
- Use pandas.to_numeric() Function
- Change the Index Order in Pandas Series
- Pandas Series.dtype() Function
- Pandas Series.diff() Function
- Pandas Series astype() Function
- Pandas Series concat() Function
- Pandas Series.quantile() Function
- Pandas Series any() Function
- Pandas Series.shift() Function
- Pandas Series iloc[] Function
- Pandas series.str.get() Function
- Pandas Series round() Function
- Pandas Series Drop duplicates() Function