• Post author:
  • Post category:Pandas
  • Post last modified:May 6, 2024
  • Reading time:15 mins read
You are currently viewing Pandas Replace Values based on Condition

You can replace all values or selected values in a column of pandas DataFrame based on condition by using DataFrame.loc[], np.where() and DataFrame.mask() methods.

Advertisements

In this article, I will explain how to change all values in columns based on the condition in Pandas DataFrame with different methods of simple examples.

Key Points –

  • Pandas offers several methods, including loc[], np.where(), and mask(), to replace values in DataFrame columns based on specified conditions.
  • The replace() function in Pandas can be used to replace values with other values according to criteria.
  • Conditions can be formulated using logical operators such as ==, !=, <, >, etc.
  • Replacement values can be scalar values, lists, dictionaries, or even functions.
  • Condition-based value replacement is crucial for data preprocessing, cleaning, and transformation tasks, ensuring data integrity and consistency.

Quick Examples to Replace Values in the Column Based on Condition

If, you are in a hurry below are some quick examples to replace column values based on the condition in Pandas DataFrame.


# Quick examples to replace values 

# Example 1: Replace values of columns by using DataFrame.loc[] property
df.loc[df['Fee'] > 22000, 'Fee'] = 1

# Example 2: Replace values of Given column by using np.where() function
df['Fee'] = np.where(df['Fee'] > 22000, 1, df['Fee'])

# Example 3: By checking multiple conditions
df['Fee'] = np.where((df['Fee'] >= 22000) & (df['Courses'] == 'PySpark'), 14000, df['Fee'])

# Example 4: Using DataFrame.mask() function
df['Fee'].mask(df['Fee'] >=22000 ,'0', inplace=True)

To run some examples of replacing values in the column based on conditions in pandas, let’s create a Pandas DataFrame.


# Create a Pandas DataFrame
import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
    'Discount':[1000,2300,1200,2000]
              }
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print("Create DataFrame:\n", df)

Yields below output.

pandas replace based condition

Replace Values of Columns by Using DataFrame.loc[]

You can replace values of all or selected columns based on the condition of pandas DataFrame by using df.loc[] property. Use this property to access a group of rows and columns by label(s) or a boolean array. It can also manipulate the values of pandas DataFrame.

In the below example, I am replacing the values of Fee column to 15000 only for the rows where the condition of Fee column value is greater than 22000.

Note that this replaces the values on the existing DataFrame object.


# Replace values of columns by using DataFrame.loc[] property.
df.loc[df['Fee'] > 22000, 'Fee'] = 15000
print("Replace the column values based on condition:\n", df)

Yields below output.

pandas replace based condition

Replace Values of Column by Numpy.where()

Alternatively, you can use the <a href="https://sparkbyexamples.com/python/numpy-where-function/">numpy.where()</a> function to replace values of columns based on condition. This function returns the indices of elements in an input array where the given condition is satisfied. Here, NumPy is a very popular library used for calculations with 2D and 3D arrays. Before going to the NumPy function we need to import the numpy module as np.


# Replace values of given column by using np.where() function.
df = pd.DataFrame(technologies,index=index_labels)
df['Fee'] = np.where(df['Fee'] >= 22000, 15000, df['Fee'])
print("Replace the column values based on condition:\n", df)

Yields below output.


# Output:
# Replace the column values based on condition:
    Courses    Fee Duration  Discount
r1    Spark  20000   30days      1000
r2  PySpark  15000   40days      2300
r3   Python  15000   35days      1200
r4   pandas  15000   50days      2000

Replace Values By Checking Multiple Conditions

Let’s use the same approach and change the column value when multiple conditions are satisfied.


# By checking multiple conditions
df = pd.DataFrame(technologies,index=index_labels)
df['Fee'] = np.where((df['Fee'] >= 22000) & (df['Courses'] == 'PySpark'), 14000, df['Fee'])
print("Replace the column values based on multiple conditions:\n", df)

Yields below output.


# Output:
# Replace the column values based on multiple conditions:
    Courses    Fee Duration  Discount
r1    Spark  20000   30days      1000
r2  PySpark  14000   40days      2300
r3   Python  22000   35days      1200
r4   pandas  30000   50days      2000

Using DataFrame.mask() Function

Now let’s use DataFrame.mask() method to update values based on conditions. The mask() method replaces the values of the rows where the condition evaluates to True. Now using this masking condition we are going to change all the values greater than 22000 to 15000 in the Fee column.


# Using DataFrame.mask() function.
df = pd.DataFrame(technologies,index=index_labels)
df['Fee'].mask(df['Fee'] >= 22000 ,15000, inplace=True)
print("Replace the column values based on condition:\n", df)

Yields below output.


# Output:
# Replace the column values based on condition:
    Courses    Fee Duration  Discount
r1    Spark  20000   30days      1000
r2  PySpark  15000   40days      2300
r3   Python  15000   35days      1200
r4   pandas  15000   50days      2000

Complete Examples to Replace Values of Columns in Pandas


# Below are complete examples to replace values of columns in pandas.
import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
    'Discount':[1000,2300,1200,2000]
              }
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

# Replace values of columns by using DataFrame.loc[] property.
df2=df.loc[df['Fee'] > 22000, 'Fee'] = 1
print(df2)

# Using DataFrame.astype() Replace values based on condition.
df['Fee'] = (df['Fee'] > 22000).astype(int)
print(df)

# Replace values of Given column by using np.where() function.
df['Fee'] = np.where(df['Fee'] > 22000, 1, df['Fee'])
print(df)

# Using DataFrame.mask() function.
df['Fee'].mask(df['Fee'] >=22000 ,'0', inplace=True)
print(df)

Frequently Asked Questions on Replace Pandas Column Values

What is the purpose of replacing Pandas column values?

Replacing Pandas column values allows for data manipulation, cleaning, and transformation, ensuring data integrity and consistency for downstream analysis.

Can I replace values in multiple columns simultaneously?

You can replace values in multiple columns simultaneously by applying the replacement methods to each column or using methods like apply() or list comprehensions for more complex scenarios.

How can I replace NaN (missing) values in a DataFrame or Series based on a condition?

You can use the df.fillna() method to replace NaN values based on a condition. For example, df['specified_column'].fillna(0, inplace=True)

What methods does Pandas offer for replacing column values?

Pandas provide various methods such as replace(), loc[], np.where(), and mask() for condition-based value replacement.

How can I replace values in a Pandas DataFrame based on a condition?

You can use the df.loc[] indexer along with a boolean condition to replace values in a DataFrame.

Is it possible to replace values based on a combination of conditions?

You can use logical operators (e.g., & for AND, | for OR) to combine multiple conditions and replace values based on the resulting combined condition.

Conclusion

In this article, I have explained how to replace values of all columns or selected columns in pandas DataFrame based on condition by using DataFrame.loc[], np.where(), and DataFrame.mask() methods with detailed examples.

References