You can replace all values or selected values in a column of pandas DataFrame based on condition by using DataFrame.loc[]
, np.where()
and DataFrame.mask()
methods.
In this article, I will explain how to change all values in columns based on the condition in Pandas DataFrame with different methods of simple examples.
1. Quick Examples to Replace Values in the Column Based on Condition
If, you are in a hurry below are some quick examples to replace column values based on the condition in Pandas DataFrame.
# Below are some quick examples.
# Example 1: Replace values of columns by using DataFrame.loc[] property.
df.loc[df['Fee'] > 22000, 'Fee'] = 1
# Example 2: Replace values of Given column by using np.where() function.
df['Fee'] = np.where(df['Fee'] > 22000, 1, df['Fee'])
# Example 3: By checking multiple conditions
df['Fee'] = np.where((df['Fee'] >= 22000) & (df['Courses'] == 'PySpark'), 14000, df['Fee'])
# Example 4: Using DataFrame.mask() function.
df['Fee'].mask(df['Fee'] >=22000 ,'0', inplace=True)
Now, let’s create a Pandas DataFrame with a few rows and columns and execute some examples to update all or selected values with other values in a column. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
.
# Create a Pandas DataFrame.
import pandas as pd
import numpy as np
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
'Discount':[1000,2300,1200,2000]
}
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print("Create DataFrame:\n", df)
Yields below output.
2. Replace Values of Columns by Using DataFrame.loc[]
You can replace values of all or selected columns based on the condition of pandas DataFrame by using df.loc[
] property. Use this property to access a group of rows and columns by label(s) or a boolean array. It can also manipulate the values of pandas DataFrame.
In the below example, I am replacing the values of Fee
column to 15000
only for the rows where the condition of Fee
column value is greater than 22000
.
Note that this replaces the values on the existing DataFrame object.
# Replace values of columns by using DataFrame.loc[] property.
df.loc[df['Fee'] > 22000, 'Fee'] = 15000
print("Replace the column values based on condition:\n", df)
Yields below output.
3. Replace Values of Column by Numpy.where()
Alternatively, you can use the <a href="https://sparkbyexamples.com/python/numpy-where-function/">numpy.where()</a>
function to replace values of columns based on condition. This function returns the indices of elements in an input array where the given condition is satisfied. Here, NumPy is a very popular library used for calculations with 2D and 3D arrays. Before going to the NumPy function we need to import the numpy module as np.
# Replace values of given column by using np.where() function.
df = pd.DataFrame(technologies,index=index_labels)
df['Fee'] = np.where(df['Fee'] >= 22000, 15000, df['Fee'])
print("Replace the column values based on condition:\n", df)
Yields below output.
# Output:
# Replace the column values based on condition:
Courses Fee Duration Discount
r1 Spark 20000 30days 1000
r2 PySpark 15000 40days 2300
r3 Python 15000 35days 1200
r4 pandas 15000 50days 2000
4. Replace Values By Checking Multiple Conditions
Let’s use the same approach and change the column value when multiple conditions are satisfied.
# By checking multiple conditions
df = pd.DataFrame(technologies,index=index_labels)
df['Fee'] = np.where((df['Fee'] >= 22000) & (df['Courses'] == 'PySpark'), 14000, df['Fee'])
print("Replace the column values based on multiple conditions:\n", df)
Yields below output.
# Output:
# Replace the column values based on multiple conditions:
Courses Fee Duration Discount
r1 Spark 20000 30days 1000
r2 PySpark 14000 40days 2300
r3 Python 22000 35days 1200
r4 pandas 30000 50days 2000
5. Using DataFrame.mask() Function
Now let’s use DataFrame.mask()
method to update values based on conditions. The mask()
method replaces the values of the rows where the condition evaluates to True. Now using this masking condition we are going to change all the values greater than 22000
to 15000 in the Fee
column.
# Using DataFrame.mask() function.
df = pd.DataFrame(technologies,index=index_labels)
df['Fee'].mask(df['Fee'] >= 22000 ,15000, inplace=True)
print("Replace the column values based on condition:\n", df)
Yields below output.
# Output:
# Replace the column values based on condition:
Courses Fee Duration Discount
r1 Spark 20000 30days 1000
r2 PySpark 15000 40days 2300
r3 Python 15000 35days 1200
r4 pandas 15000 50days 2000
6. Complete Examples to Replace Values of Columns in Pandas
# Below are complete examples to replace values of columns in pandas.
import pandas as pd
import numpy as np
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
'Discount':[1000,2300,1200,2000]
}
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)
# Replace values of columns by using DataFrame.loc[] property.
df2=df.loc[df['Fee'] > 22000, 'Fee'] = 1
print(df2)
# Using DataFrame.astype() Replace values based on condition.
df['Fee'] = (df['Fee'] > 22000).astype(int)
print(df)
# Replace values of Given column by using np.where() function.
df['Fee'] = np.where(df['Fee'] > 22000, 1, df['Fee'])
print(df)
# Using DataFrame.mask() function.
df['Fee'].mask(df['Fee'] >=22000 ,'0', inplace=True)
print(df)
Frequently Asked Questions on Replace Pandas Column Values
You can use the df.loc[]
indexer along with a boolean condition to replace values in a DataFrame.
You can use the df.loc[]
indexer with a Series to replace values based on a condition, similar to the DataFrame.
You can use logical operators like |
and &
operators to specify multiple conditions.
You can use the df.fillna()
method to replace NaN values based on a condition. For example, df['specified_column'].fillna(0, inplace=True)
Yes, you can use a custom function with apply
and conditionally apply it to replace values. For example, you can define a function to apply to a DataFrame or Series.
Conclusion
In this article, you have learned how to replace values of all columns or selected columns in pandas DataFrame based on condition by using DataFrame.loc[]
, np.where()
, and DataFrame.mask()
methods with detailed examples.
Related Articles
- Create New Pandas DataFrame By Selecting Specific Columns
- Remove Duplicate Columns From Pandas DataFrame
- Pandas Convert Datetime to Date Column
- Convert Integer to Datetime Type in Pandas
- Convert Pandas DataFrame to NumPy Array
- Pandas Replace Blank Values (empty) with NaN
- Pandas Replace substring in DataFrame
- Pandas DataFrame replace() with examples
- Pandas Replace substring in DataFrame
- How to Replace String in Pandas DataFrame
- Pandas – Replace NaN Values with Zero in a Column