• Post author:
  • Post category:Pandas
  • Post last modified:July 31, 2024
  • Reading time:14 mins read
You are currently viewing Pandas DataFrame clip() Method

In pandas, the clip() method is used to trim values at specified boundaries. This method is particularly useful for limiting the values in a DataFrame to a specified range, helping to handle outliers or to normalize data within a certain range.

Advertisements

In this article, I will explain the Pandas DataFrame clip() method by using its syntax, parameters, and usage, and how to return a DataFrame where values outside the specified threshold limits are replaced.

Key Points –

  • The clip() method is used to limit the values in a DataFrame to a specified lower and upper boundary, effectively capping values within the provided range.
  • It takes lower and upper parameters to define the minimum and maximum threshold values. These can be specified as scalars or array-like values.
  • The inplace parameter, when set to True, allows the operation to be performed directly on the original DataFrame, modifying it without creating a new DataFrame.
  • The axis parameter can be used to apply the clipping operation along a specific axis (0 or index for rows, 1 or columns for columns), especially useful when lower and upper are array-like.

Syntax of Pandas DataFrame clip() Method

Following is the syntax of the pandas DataFrame.clip() function.


# Syntax of DataFrame.clip() function
DataFrame.clip(lower=None, upper=None, axis=None, inplace=False, *args, **kwargs)

Parameters of the clip()

Following are the parameters of the clip() method

  • lowerfloat or array-like, optional. Minimum threshold value. Values below this will be replaced with this value.
  • upperfloat or array-like, optional. Maximum threshold value. Values above this will be replaced with this value.
  • axis{0 or 'index', 1 or 'columns'}, default None. Align with the axis if using array-like lower or upper thresholds.
  • inplacebool, default False. If True, performs operation in-place, and returns None.
  • *args – positional arguments, optional. Additional arguments passed to the function (currently unused).
  • kwargs – keyword arguments, optional. Additional keyword arguments passed to the function (currently unused).

Return Value

It returns a DataFrame with values outside the threshold values replaced.

Usage of Pandas DataFrame clip() Method

The pandas.DataFrame.clip() method is used to constrain the values in a DataFrame to fall within a specified range, defined by lower and upper bounds. This is useful for handling outliers or ensuring that values remain within certain limits.

Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are A and B.


import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'A': [2, -5, 3, -8, 14],
    'B': [4, -6, 0, 9, -3]
})

print("Original DataFrame:\n",df)

Yields below output.

pandas clip

To use the clip() method with an upper threshold, you can specify the upper parameter. Here, you can see that values in columns A and B that were greater than 10 have been replaced with 10.


# Apply the clip method with the upper threshold
df2 = df.clip(upper=10)
print("Clipped DataFrame (Upper Threshold 10):\n", df2)

# Clip values above 10
upper_threshold = 10
df2 = df.clip(upper=upper_threshold)
print("Clipped DataFrame (Upper Threshold 10):\n",df2)

Yields below output.

pandas clip

Use DataFrame.clip() Method with a Lower Threshold

Alternatively, to use the DataFrame.clip() method with a lower threshold, you can specify the lower parameter. This will replace any values in the DataFrame that are less than the specified lower threshold with that threshold value.


# Clip values below 1
df2 = df.clip(lower=1)
print("Clipped DataFrame (Lower Threshold 1):\n", df2)

# Output:
# Clipped DataFrame (Lower Threshold 1):
#      A  B
# 0   2  4
# 1   1  1
# 2   3  1
# 3   1  9
# 4  14  1

Here, you can see that values in columns A and B that were less than 1 have been replaced with 1.

Use DataFrame.clip() Method with a Lower and Upper Threshold

You can use the DataFrame.clip() method with both a lower and an upper threshold, you can specify both the lower and upper parameters. This will replace any values in the DataFrame that are less than the lower threshold with the lower threshold value and any values that are greater than the upper threshold with the upper threshold value.


# Clip values below 2 and above 8
df2 = df.clip(lower=2, upper=8)
print("Clipped DataFrame (Lower Threshold 2 and Upper Threshold 8):\n", df2)

# Output:
# Clipped DataFrame (Lower Threshold 2 and Upper Threshold 8):
#     A  B
# 0  2  4
# 1  2  2
# 2  3  2
# 3  2  8
# 4  8  2

Here, you can see that values in columns A and B that were less than 2 have been replaced with 2, and values that were greater than 8 have been replaced with 8.

In-Place Clipping with Lower and Upper Thresholds

To perform in-place clipping on a DataFrame with both lower and upper thresholds, you can use the clip() method with the lower, upper, and inplace parameters set. This will modify the original DataFrame directly without creating a new one.


# Apply the clip method in-place with both thresholds
df.clip(lower=2, upper=8, inplace=True)
print("In-Place Clipped DataFrame:\n", df)

# Output:
# In-Place Clipped DataFrame:
#     A  B
# 0  2  4
# 1  2  2
# 2  3  2
# 3  2  8
# 4  8  2

Here, the values in columns A and B that were less than 2 have been replaced with 2, and values greater than 8 have been replaced with 8 in the original DataFrame df.

Clipping with NaN Values

Similarly, when clipping a DataFrame that contains NaN values, the clip() method will leave NaN values unchanged. This is because NaN is not considered in the comparison operations performed by clip.


import pandas as pd
import numpy as np

# Sample DataFrame with NaN values
df = pd.DataFrame({
    'A': [2, -5, np.nan, -8, 14],
    'B': [4, -6, 0, np.nan, -3]
})

# Clip values below 0 and above 10
df2 = df.clip(lower=0, upper=10)
print("Clipped DataFrame with NaN values:\n",df2)

# Output:
# Clipped DataFrame with NaN values:
#        A    B
# 0   2.0  4.0
# 1   0.0  0.0
# 2   NaN  0.0
# 3   0.0  NaN
# 4  10.0  0.0

Frequently Asked Questions on Pandas DataFrame clip() Method

What does the clip() method do?

The clip() method is used to limit the values in a DataFrame to be within a specified range, defined by lower and upper boundaries. Values below the lower boundary are set to the lower boundary, and values above the upper boundary are set to the upper boundary.

Can I set different thresholds for different columns or rows?

You can use array-like values or Series for lower and upper parameters to specify different thresholds for different columns or rows.

How does clipping handle NaN values?

NaN values are not affected by clipping. They remain unchanged in the DataFrame after clipping.

Can I clip values along a specific axis?

By setting the axis parameter. For example, to clip values along rows, use axis=0, and to clip values along columns, use axis=1.

What happens if I set the inplace parameter to True?

If inplace=True, the method modifies the original DataFrame directly and does not return a new DataFrame.

Conclusion

In this article, I have explained the Pandas DataFrame clip() function by using its syntax, parameters, usage, and how to return a DataFrame of the same type as the calling object, with values outside the clipping boundaries replaced, or None if inplace=True is specified.

Happy Learning!!

References