pandas.DataFrame.replace() function is used to replace values in columns (one value with another value on all columns). It is a powerful tool for data cleaning and transformation. This method takes to_replace
, value
, inplace
, limit
, regex
, and method
as parameters and returns a new DataFrame. When inplace=True
is used, it replaces on existing DataFrame object and returns None
value.
This function is used to replace column values of str, regex, list, dict, Series, int, and float with specified values. In this article, I will explain a Pandas DataFrame replace() method syntax, and usage with examples.
It is one of the most useful functions and most powerful as it replaces values by matching with regex (regular expression).
Key Points –
- The
replace()
function allows replacing values in a DataFrame across all columns or specific ones. - Works with strings, numbers, lists, dictionaries, Series, and regex patterns to define replacements.
- By default,
replace()
returns a new DataFrame, but usinginplace=True
modifies the original DataFrame. - Enables replacing values based on regular expressions by setting the
regex=True
parameter. - The
limit
parameter restricts the number of replacements to a specific count. replace()
can replaceNaN
or missing values in DataFrames, enhancing data cleaning flexibility.- The
method
argument allows for advanced filling options, such as forward fill (ffill
) or backward fill (bfill
).
Related: You can replace the Pandas values based on condition.
1. replace() Syntax
Below is the syntax of the replace() method. This is also used to replace the substring in the column.
# Syntax of replace() method
DataFrame.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')
to_replace
– Takes str, regex, list, dict, Series, int, float, or Nonevalue
– scalar, dict, list, str, regex, default Noneinplace
– bool, default Falselimit
– int, default Noneregex
– bool or same types as to_replace, default Falsemethod
– {‘pad’, ‘ffill’, ‘bfill’, None}
2. Pandas replace() Examples
pandas replace() method is used to find a value on a DataFrame and replace it with another value on all columns & rows.
Let’s create a DataFrame from a Python dictionary with columns like 'Courses'
, 'Fee'
, and 'Duration'
.
# Create a pandas DataFrame.
import pandas as pd
import numpy as np
technologies = {
'Courses':["Spark","PySpark","Spark","Python","PySpark"],
'Fee' :[22000,25000,23000,24000,26000],
'Duration':['30days','50days','30days','35days','NaN']
}
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)
Yields below output.
To apply the replace() method to the DataFrame along with specified values. For example, df.replace('PySpark','Python with Spark')
this syntax replaces all occurrences of the string 'PySpark'
with the string 'Python with Spark'
in the entire DataFrame.
# Replace column value
df2 = df.replace('Spark','Apache Spark')
print("After replacing a value with another value:\n", df2)
Yields below output. This replace() method has been replaced 'Spark'
with 'Apache Spark'
on the entire DataFrame and returns a new object. Use inplace=True
param to update on existing DataFrame object. This ideally replaces the string with another string.
To replace NaN values, use DataFrame.fillna() function to replace NaN with empty/bank.
3. Replace Values in a Specific Column
In case you want to replace values in a specific column of pandas DataFrame, first, select the column you want to update values and use the replace()
method to replace its value with another value.
# Replace Values in a specific Column
df['Courses'] = df['Courses'].replace('Spark','Apache Spark')
print("After replacing a value with another value:\n", df2)
Yields the same output as above.
4. Replace with Multiple Values
Now, let’s see how to find multiple values from a list and replace them with other values in a list.
# Replace multiple values
df2 = df.replace(['Spark','PySpark'],['Apache Spark', 'Apache PySpark'])
print("After replacing a multiple values with another values:\n", df2)
Yields below output
# Output:
# After replacing a multiple values with another values
Courses Fee Duration
0 Apache Spark 22000 30days
1 Apache PySpark 25000 50days
2 Apache Spark 23000 30days
3 Python 24000 35days
4 Apache PySpark 26000 NaN
You can also replace it with the same value for multiple values
# Replace with same value for multiple
df2 = df.replace(['30days','35days'],'40days')
print("After replacing a multiple values with another values:\n", df2)
Yields below output.
# Output:
# After replacing a multiple values with another values:
Courses Fee Duration
0 Apache Spark 22000 40days
1 PySpark 25000 50days
2 Apache Spark 23000 40days
3 Python 24000 40days
4 PySpark 26000 NaN
5. Replace with Dictionary
You can also replace a column values in a Pandas DataFrame with a dictionary by using the replace()
function. The replace()
function allows you to specify a dictionary that maps values in the column to the new values you want to replace them with.
# Replace on multiple columns
df2 = df.replace({'Courses': 'Apache Spark', 'Duration': '35days'},
{'Courses': 'Spark', 'Duration': '40days'})
print("After replacing a multiple values with another values:\n", df2)
Yields below output. The code attempts to replace the values ‘Apache Spark’ in the ‘Courses’ column with ‘Spark’ and the values ’35days’ in the ‘Duration’ column with ’40days’.
# Output:
# After replacing a multiple values with another values:
Courses Fee Duration
0 Spark 22000 30days
1 PySpark 25000 50days
2 Spark 23000 30days
3 Python 24000 40days
4 PySpark 26000 NaN
Frequently Asked Questions on Pandas Replace
The replace()
method in Pandas is used to replace values in a DataFrame or Series with other values.
You can use the replace
method by specifying the value you want to replace and the new value you want to assign. For example, df.replace(old_value, new_value)
You can replace multiple values at once by providing a dictionary of replacements.
To replace NaN or missing values, you can use the replace()
method with np.nan
or the equivalent string ‘NaN’. For example, df['column_name'].replace(np
.nan, new_value)
No, by default, the replace()
method does not modify the original DataFrame or Series. To make in-place replacements, you need to specify inplace=True
.
Conclusion
In this article, You have learned the Pandas replace()
method and using its syntax, parameters, and usage how we can replace the column value, regex, list, dictionary, series, number, etc with another value.
Related Articles
- How to Replace String in pandas DataFrame
- Pandas Replace substring in DataFrame
- How to Change Column Name in Pandas
- Pandas Replace Column value in DataFrame
- How to Rename Specific Columns in Pandas
- Pandas Series.replace() – Replace Values
- pandas.DataFrame.fillna() – Explained by Examples
- Pandas Convert Column to Float in DataFrame
- Pandas Rename Column with Examples
- Replace the Pandas values based on condition.
- Pandas Replace Blank Values (empty) with NaN
- Pandas Replace NaN with Blank/Empty String
- Pandas Replace NaN Values with Zero in a Column