• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:14 mins read
You are currently viewing Pandas DataFrame replace() – by Examples

pandas.DataFrame.replace() function is used to replace values in columns (one value with another value on all columns). It is a powerful tool for data cleaning and transformation. This method takes to_replace, value, inplace, limit, regex, and method as parameters and returns a new DataFrame. When inplace=True is used, it replaces on existing DataFrame object and returns None value.

Advertisements

This function is used to replace column values of str, regex, list, dict, Series, int, and float with specified values. In this article, I will explain a Pandas DataFrame replace() method syntax, and usage with examples.

It is one of the most useful functions and most powerful as it replaces values by matching with regex (regular expression).

Related: You can replace the Pandas values based on condition.

1. replace() Syntax

Below is the syntax of the replace() method. This is also used to replace the substring in the column.


# Syntax of replace() method
DataFrame.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')
  • to_replace – Takes str, regex, list, dict, Series, int, float, or None
  • valuescalar, dict, list, str, regex, default None
  • inplacebool, default False
  • limitint, default None
  • regexbool or same types as to_replace, default False
  • method{‘pad’, ‘ffill’, ‘bfill’, None}

2. Pandas replace() Examples

pandas replace() method is used to find a value on a DataFrame and replace it with another value on all columns & rows.

Let’s create a DataFrame from a Python dictionary with columns like 'Courses', 'Fee', and 'Duration'.


# Create a pandas DataFrame.
import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Spark","Python","PySpark"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Duration':['30days','50days','30days','35days','NaN']
          }
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)

Yields below output.

pandas replace

To apply the replace() method to the DataFrame along with specified values. For example, df.replace('PySpark', 'Python with Spark') this syntax replaces all occurrences of the string 'PySpark' with the string 'Python with Spark' in the entire DataFrame.


# Replace column value
df2 = df.replace('Spark','Apache Spark')
print("After replacing a value with another value:\n", df2)

Yields below output. This replace() method has been replaced 'Spark' with 'Apache Spark' on the entire DataFrame and returns a new object. Use inplace=True param to update on existing DataFrame object. This ideally replaces the string with another string.

pandas replace

To replace NaN values, use DataFrame.fillna() function to replace NaN with empty/bank.

3. Replace Values in a Specific Column

In case you want to replace values in a specific column of pandas DataFrame, first, select the column you want to update values and use the replace() method to replace its value with another value.


# Replace Values in a specific Column
df['Courses'] = df['Courses'].replace('Spark','Apache Spark')
print("After replacing a value with another value:\n", df2)

Yields the same output as above.

4. Replace with Multiple Values

Now, let’s see how to find multiple values from a list and replace them with other values in a list.


# Replace multiple values
df2 = df.replace(['Spark','PySpark'],['Apache Spark', 'Apache PySpark'])
print("After replacing a multiple values with another values:\n", df2)

Yields below output


# Output:
# After replacing a multiple values with another values
          Courses    Fee Duration
0    Apache Spark  22000   30days
1  Apache PySpark  25000   50days
2    Apache Spark  23000   30days
3          Python  24000   35days
4  Apache PySpark  26000      NaN

You can also replace it with the same value for multiple values


# Replace with same value for multiple
df2 = df.replace(['30days','35days'],'40days')
print("After replacing a multiple values with another values:\n", df2)

Yields below output.


# Output:
# After replacing a multiple values with another values:
        Courses    Fee Duration
0  Apache Spark  22000   40days
1       PySpark  25000   50days
2  Apache Spark  23000   40days
3        Python  24000   40days
4       PySpark  26000      NaN

5. Replace with Dictionary

You can also replace a column values in a Pandas DataFrame with a dictionary by using the replace() function. The replace() function allows you to specify a dictionary that maps values in the column to the new values you want to replace them with.


# Replace on multiple columns
df2 = df.replace({'Courses': 'Apache Spark', 'Duration': '35days'}, 
                 {'Courses': 'Spark', 'Duration': '40days'})
print("After replacing a multiple values with another values:\n", df2)

Yields below output. The code attempts to replace the values ‘Apache Spark’ in the ‘Courses’ column with ‘Spark’ and the values ’35days’ in the ‘Duration’ column with ’40days’.


# Output:
# After replacing a multiple values with another values:
   Courses    Fee Duration
0    Spark  22000   30days
1  PySpark  25000   50days
2    Spark  23000   30days
3   Python  24000   40days
4  PySpark  26000      NaN

Frequently Asked Questions on Pandas Replace

What is the purpose of the replace() method in Pandas?

The replace() method in Pandas is used to replace values in a DataFrame or Series with other values.

How do I use the replace() method to replace values in a DataFrame or Series?

You can use the replace method by specifying the value you want to replace and the new value you want to assign. For example, df.replace(old_value, new_value)

How can I use replace() to replace multiple values at once?

You can replace multiple values at once by providing a dictionary of replacements.

How do I replace NaN or missing values using replace()?

To replace NaN or missing values, you can use the replace() method with np.nan or the equivalent string ‘NaN’. For example, df['column_name'].replace(np.nan, new_value)

Does the replace() method modify the original DataFrame or Series by default?

No, by default, the replace() method does not modify the original DataFrame or Series. To make in-place replacements, you need to specify inplace=True.

Conclusion

In this article, You have learned the Pandas replace() method and using its syntax, parameters, and usage how we can replace the column value, regex, list, dictionary, series, number, etc with another value.

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium