Pandas Replace Values based on Condition

You can replace all values or selected values in a column of pandas DataFrame based on condition by using DataFrame.loc[], np.where() and DataFrame.mask() methods.

In this article, I will explain how to change all values in columns based on the condition in pandas DataFrame with different methods of simples examples.

1. Quick Examples to Replace Values in Column Based on Condition

If, you are in hurry below are some quick examples to replace column values based on the condition in pandas DataFrame.


# Below are some quick examples.

# Replace values of columns by using DataFrame.loc[] property.
df.loc[df['Fee'] > 22000, 'Fee'] = 1

# Replace values of Given column by using np.where() function.
df['Fee'] = np.where(df['Fee'] > 22000, 1, df['Fee'])

# By checking multiple conditions
df['Fee'] = np.where((df['Fee'] >= 22000) & (df['Courses'] == 'PySpark'), 14000, df['Fee'])

# Using DataFrame.mask() function.
df['Fee'].mask(df['Fee'] >=22000 ,'0', inplace=True)

Now, let’s create a Pandas DataFrame with a few rows and columns and execute some examples to update all or selected values with other values in a column. Our DataFrame contains column names CoursesFeeDuration, and Discount.


# Create a Pandas DataFrame.
import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
    'Discount':[1000,2300,1200,2000]
              }
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

Yields below output.


# Output:
    Courses    Fee Duration  Discount
r1    Spark  20000   30days      1000
r2  PySpark  25000   40days      2300
r3   Python  22000   35days      1200
r4   pandas  30000   50days      2000

2. Replace Values of Columns by Using DataFrame.loc[]

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame.loc[] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.

In the below example, I am replacing the values of Fee column to 15000 only for the rows where the condition of Fee column value is greater than 22000.

Note that this replaces the values on existing DataFrame object.


# Replace values of columns by using DataFrame.loc[] property.
df.loc[df['Fee'] > 22000, 'Fee'] = 15000

Yields below output.


    Courses    Fee Duration  Discount
r1    Spark  20000   30days      1000
r2  PySpark  15000   40days      2300
r3   Python  22000   35days      1200
r4   pandas  15000   50days      2000

3. Replace Values of Column by Numpy.where()

Another method to replace values of columns based on condition by using numpy.where() function. The where() function returns the indices of elements in an input array where the given condition is satisfied. Here, NumPy is a very popular library used for calculations with 2d and 3d arrays.


# Replace values of Given column by using np.where() function.
df = pd.DataFrame(technologies,index=index_labels)
df['Fee'] = np.where(df['Fee'] >= 22000, 15000, df['Fee'])
print(df)

Yields below output.


# Output:
    Courses    Fee Duration  Discount
r1    Spark  20000   30days      1000
r2  PySpark  15000   40days      2300
r3   Python  15000   35days      1200
r4   pandas  15000   50days      2000

4. Replace Values By Checking Multiple Conditions

Let’s use the same approach and change column value when multiple conditions satisfied.


# By checking multiple conditions
df = pd.DataFrame(technologies,index=index_labels)
df['Fee'] = np.where((df['Fee'] >= 22000) & (df['Courses'] == 'PySpark'), 14000, df['Fee'])
print(df)

Yields below output.


# Output:
    Courses    Fee Duration  Discount
r1    Spark  20000   30days      1000
r2  PySpark  14000   40days      2300
r3   Python  22000   35days      1200
r4   pandas  30000   50days      2000

5. Using DataFrame.mask() Function

Now let’s use DataFrame.mask() method to update values based on conditions. The mask() method replaces the values of the rows where the condition evaluates to True. Now using this masking condition we are going to change all the values greater than 22000 to 15000 in the Fee column.


# Using DataFrame.mask() function.
df = pd.DataFrame(technologies,index=index_labels)
df['Fee'].mask(df['Fee'] >= 22000 ,15000, inplace=True)
print(df)

Yields below output.


# Output:
    Courses    Fee Duration  Discount
r1    Spark  20000   30days      1000
r2  PySpark  15000   40days      2300
r3   Python  15000   35days      1200
r4   pandas  15000   50days      2000

6. Complete Examples to Replace Values of Columns in Pandas


# Below are complete examples to replace values of columns in pandas.
import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
    'Discount':[1000,2300,1200,2000]
              }
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

# Replace values of columns by using DataFrame.loc[] property.
df2=df.loc[df['Fee'] > 22000, 'Fee'] = 1
print(df2)

# Using DataFrame.astype() Replace values based on condition.
df['Fee'] = (df['Fee'] > 22000).astype(int)
print(df)

# Replace values of Given column by using np.where() function.
df['Fee'] = np.where(df['Fee'] > 22000, 1, df['Fee'])
print(df)

# Using DataFrame.mask() function.
df['Fee'].mask(df['Fee'] >=22000 ,'0', inplace=True)
print(df)

Conclusion

In this article, you have learned how to replace values of all columns or selected columns in pandas DataFrame based on condition by using DataFrame.loc[], np.where() and DataFrame.mask() methods with detailed examples.

References

Leave a Reply

You are currently viewing Pandas Replace Values based on Condition