• Post author:
  • Post category:Pandas
  • Post last modified:May 15, 2024
  • Reading time:14 mins read
You are currently viewing How to Replace String in Pandas DataFrame

In pandas, to replace a string in the DataFrame column, you can use either the replace() function or the str.replace() method along with lambda methods.

Advertisements

In this article, I will explain how to replace strings in a Pandas offers several methods for replacing strings within DataFrame columns. The primary method is to use the replace() function.

Key Points –

  • Use the str.replace() method in pandas to replace strings in DataFrame columns efficiently.
  • Specify the string to be replaced and its replacement within the method parameters.
  • Employ regular expressions for handling complex string replacement patterns effectively.
  • Ensure to assign the modified DataFrame back to the original DataFrame or a new variable to retain changes.

Quick Examples to Replace String

Below are some quick examples of how to replace a string in Pandas DataFrame.


# Quick examples to replace string

# Example 1: Replace string 
# Using DataFrame.replace() method
df2 = df.replace('Py','Python with ', regex=True)

# Example 2: Replace pattern of string 
# Using regular expression.
df2 = df.replace({'Courses': 'Py', 'Duration': 'days'}, 
    {'Courses': 'Python with', 'Duration': ' Days'}, regex=True)

# Example 3: Replace pattern of string 
# Using regular expression
df2=df.replace(regex=['Language'],value='Lang')

# Example 4: By using str.replace()
df['Courses'] = df['Courses'].str.replace('Language','Lang')

# Example 5: Replace String 
# Using apply() function with lambda
df2 = df.apply(lambda x: x.replace({'Py':'Python with', 'Language':'Lang'}, regex=True))

To run some examples of the replace string in pandas DataFrame, let’s create Pandas DataFrame using data from a dictionary.


# Create a pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Spark","Java Language","PySpark","PHP Language"],
    'Fee' :[22000,25000,23000,24000,26000,27000],
    'Duration':['30days','50days','30days','60days','35days','30days']
          }
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)

Yields below output.

pandas replace string

Pandas Replace String Example

You can utilize the DataFrame.replace() function to replace strings within a Pandas DataFrame column. This function updates the specified value with another specified value and returns a new DataFrame. In order to update on existing DataFrame use inplace=True.


# Replace string using DataFrame.replace() method.
df2 = df.replace('PySpark','Python with Spark')
print("After replacing the string values of a single column:\n", df2)

In the above example, you create a DataFrame df with columns Courses, Fee, and Duration. Then you use the DataFrame.replace() method to replace PySpark with Python with Spark in the Courses column. This example yields the below output.

pandas replace string

Replace Multiple Strings

Now let’s see how to replace multiple string column(s), In this example, I will also show how to replace part of the string by using regex=True param. To update multiple string columns, use the dict with a key-value pair. The below example updates Py with Python with on Courses column and days with Days on Duration column.


# Replace pattern of string using regular expression.
df2 = df.replace({'Courses': 'Py', 'Duration': 'days'}, 
    {'Courses': 'Python with ', 'Duration': ' Days'}, regex=True)
print("After replacing the string values of multiple columns:\n", df2)

Yields below output.


# Output:
# After replacing the string values of multiple columns
             Courses    Fee Duration
0              Spark  22000  30 Days
1  Python with Spark  25000  50 Days
2              Spark  23000  30 Days
3      Java Language  24000  60 Days
4  Python with Spark  26000  35 Days
5       PHP Language  27000  30 Days

Replace Pattern of String

Alternatively, using regular expressions you can replace matching strings with other strings within a Pandas DataFrame. The below example finds a string Language and replace it with Lan.


# Replace pattern of string using regular expression
df2=df.replace(regex=['Language'],value='Lang')
print("After replacing the string values of a single column:\n", df2)

Yields below output.


# Output:
# After replacing the string values of a single column:
     Courses    Fee Duration
0      Spark  22000   30days
1    PySpark  25000   50days
2      Spark  23000   30days
3  Java Lang  24000   60days
4    PySpark  26000   35days
5   PHP Lang  27000   30days

Use str.replace() Function

To use str.replace() on a DataFrame, you would first access the column containing the strings you want to replace using square brackets ([]), then apply the str.replace() method.


# By using str.replace()
df['Courses'] = df['Courses'].str.replace('Language','Lang')
print("After replacing the string values of a single column:\n", df)

In the above example, you create a DataFrame df with a column Courses containing strings. Then, you use the str.replace() method directly on the Courses column of the DataFrame to replace Language with Lang.

Using apply() Function Along with Lambda

Similarly, you can use DataFrame.apply() with a lambda expression to replace strings. The apply() method in Pandas permits the application of a function along one of the DataFrame axes. By default, the axis is set to 0, representing the index (row) axis.


# Replace String using apply() function with lambda.
df2 = df.apply(lambda x: x.replace({'Py':'Python with', 'Language':'Lang'}, regex=True))
print("After replacing the string values of a single column:\n", df2)

Yields below output.


# Output:
# After replacing the string values of a single column:
            Courses    Fee Duration
0             Spark  22000   30days
1  Python withSpark  25000   50days
2             Spark  23000   30days
3         Java Lang  24000   60days
4  Python withSpark  26000   35days
5          PHP Lang  27000   30days

Complete Example


# Create a pandas DataFrame
import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Spark","P","PySpark","P"],
    'Fee' :[22000,25000,23000,24000,26000,27000],
    'Duration':['30days','50days','30days','60days','35days','30days']
          }
df = pd.DataFrame(technologies)
print(df)

# Replace string using DataFrame.replace() method
df2 = df.replace('Py','Python with ', regex=True)
print(df2)

# Replace pattern of string using regular expression
df2 = df.replace({'Courses': 'Py', 'Duration': 'days'}, 
    {'Courses': 'Python with', 'Duration': ' Days'}, regex=True)
print(df2)

# Replace pattern of string using regular expression.
df2=df.replace(regex=['Language'],value='Lang')
print(df2)

# By using str.replace()
df['Courses'] = df['Courses'].str.replace('Language','Lang')
print(df)

# Replace String using apply() function with lambda
df2 = df.apply(lambda x: x.replace({'Py':'Python with', 'Language':'Lang'}, regex=True))
print(df2)

FAQs on Replace String

What is the purpose of replacing strings in a Pandas DataFrame?

Replacing strings in a DataFrame is often necessary for data cleaning and preprocessing tasks. It allows you to standardize values, correct errors, or prepare the data for analysis.

Can I apply a custom function to replace strings in a DataFrame column?

You can apply a custom function to replace strings in a DataFrame column using the apply() method.

Can I replace multiple strings in a DataFrame column simultaneously?

You can replace multiple strings in a DataFrame column simultaneously using the replace() method with a dictionary.

Can I replace multiple strings at once in a DataFrame?

You can replace multiple strings simultaneously by providing lists of strings to replace and their corresponding replacement values to the .replace() method.

Conclusion

In this article, I have explained how to replace the string in the Pandas column by using DataFrame.replace() and str.replace() with lambda function.

References