• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:10 mins read
You are currently viewing Pandas Replace substring in DataFrame

You can find out how to replace substring in a column of Pandas DataFrame using DataFrame.replace() with lambda functions. In this article, I will explain how to replace the substring in the DataFrame column with multiple examples.

  • Replace a substring with another substring in Pandas.
  • Replace a pattern of a substring with another substring using regular expression.

Related: You can replace the string in Pandas DataFrame.

1. Quick Examples to Replace Substring

If you are in a hurry below are some quick examples of how to replace a substring in a column of pandas DataFrame.


# Below are some quick examples.

# Example 1: Replace substring
df2 = df.replace('Py','Python with ', regex=True)

# Example 2: Replace substring
df2 = df.replace('Py','Python with ', regex=True)

# Example 3: Replace multiple substrings
df2 = df.replace({'Courses': 'Py', 'Duration': 'days'}, 
    {'Courses': 'Python with', 'Duration': ' Days'}, regex=True)

# Example 4: Replace pattern of Substring using regular expression.
df2=df.replace(regex=['Language'],value='Lang')

# Example 5: Using str.replace()
df['Courses'] = df['Courses'].str.replace('Language','Lang')

# Example 6: Replace SubString using apply() function with lambda.
df2 = df.apply(lambda x: x.replace(
    {'Py':'Python with', 'Language':'Lang'},
    regex=True))

Now, Let’s create a pandas DataFrame with a few rows and columns, execute these examples, and validate the results. Our DataFrame contains column names CoursesFee and Duration.


# Create a pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Spark","Java Language","PySpark","PHP Language"],
    'Fee' :[22000,25000,23000,24000,26000,27000],
    'Duration':['30days','50days','30days','60days','35days','30days']
          }
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)

Yields below output.

pandas replace substring

2. Replace Substring Using replace() Method

You can replace the substring of the Pandas DataFrame column by using the DataFrame.replace() method. This method by default finds the exact string match and replaces it with the specified value. Use regex=True to replace the substring.


# Replace substring
df2 = df.replace('Py','Python with ', regex=True)
print("After replacing the substring with another substring:\n", df2)

Yields below output. The above example replaced the substring value Py with Python on column Courses.

pandas replace substring

This method returns a new DataFrame after replacing the substring. Use inplace=True to replace on existing DataFrame object.

3. Replace Multiple Substrings

Alternatively, you can apply this method to multiple string columns in a DataFrame and allows you to replace occurrences of substrings with other substrings. Let’s see how to replace substring on multiple columns, to do this I will be using dict with column names and values to replace.


# Replace multiple substrings
df2 = df.replace({'Courses': 'Py', 'Duration': 'days'}, 
    {'Courses': 'Python with', 'Duration': ' Days'}, regex=True)
print("After replacing the multiple substrings:\n", df2)

Yields below output.


# Output:
# After replacing the multiple substrings:
            Courses    Fee Duration
0             Spark  22000  30 Days
1  Python withSpark  25000  50 Days
2             Spark  23000  30 Days
3     Java Language  24000  60 Days
4  Python withSpark  26000  35 Days
5      PHP Language  27000  30 Days

4. Replace Pattern of Substring Using Regular Expression

Using regular expression you can replace the matching substring with another substring. The below examples find the substring Language and replace it with another substringLang.


# Replace pattern of Substring using regular expression.
df2=df.replace(regex=['Language'],value='Lang')
print("After replacing the substring:\n", df2)

Yields below output.


# Output:
# After replacing the substring:
     Courses    Fee Duration
0      Spark  22000   30days
1    PySpark  25000   50days
2      Spark  23000   30days
3  Java Lang  24000   60days
4    PySpark  26000   35days
5   PHP Lang  27000   30days

5. By using str.replace() on DataFrame

Alternatively, you can use str.replace() to replace a substring, repalce() looks for exact matches unless you pass a regex pattern and param regex=True.


# By using str.replace()
df['Courses'] = df['Courses'].str.replace('Language','Lang')
print("After replacing the substring:\n", df)

Yields the same output as above. Note that this replaces the value on the Courses column in the existing DataFrame object.

6. Replace Substring Using apply() function with lambda

In this section, You can find out how to replace the substring using DataFrame.apply() and lambda function. The apply() method allows you to apply a function along with one of the axes of the DataFrame. The below example replaces multiple substring’s.


# Replace SubString using apply() function with lambda.
df2 = df.apply(lambda x: x.replace({'Py':'Python with', 'Language':'Lang'}, regex=True))
print(df2)

Yields below output.


# Output:
            Courses    Fee Duration
0             Spark  22000   30days
1  Python withSpark  25000   50days
2             Spark  23000   30days
3         Java Lang  24000   60days
4  Python withSpark  26000   35days
5          PHP Lang  27000   30days

7. Complete Example of pandas Replace Substring


# Create a pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Spark","Java Language","PySpark","PHP Language"],
    'Fee' :[22000,25000,23000,24000,26000,27000],
    'Duration':['30days','50days','30days','60days','35days','30days']
          }
df = pd.DataFrame(technologies)
print(df)

# Replace substring
df2 = df.replace('Py','Python with ', regex=True)
print(df2)

# Replace substring
df2 = df.replace('Py','Python with ', regex=True)
print(df2)

# Replace multiple substrings
df2 = df.replace({'Courses': 'Py', 'Duration': 'days'}, 
    {'Courses': 'Python with', 'Duration': ' Days'}, regex=True)
print(df2)

# Replace pattern of Substring using regular expression.
df2=df.replace(regex=['Language'],value='Lang')
print(df2)

# Using str.replace()
df['Courses'] = df['Courses'].str.replace('Language','Lang')
print(df)

# Replace SubString using apply() function with lambda.
df2 = df.apply(lambda x: x.replace(
    {'Py':'Python with', 'Language':'Lang'},
    regex=True))
print(df2)

Conclusion

In this article, You have learned how to replace the substring of pandas DataFrame by using DataFrame.replace() with lambda functions with some examples.

References