How to Replace String in pandas DataFrame

You can replace a string in the pandas DataFrame column by using replace(), str.replace() with lambda functions. In this article, I will explain how to replace the string of the DataFrame column with multiple examples.

  • Replace a string with another string in pandas.
  • Replace a pattern of string with another string using regular expression.

1. Quick Examples to Replace String in DataFrame

If you are in hurry below are some examples of how to replace a string in pandas DataFrame.


# Below are some quick examples.

# Replace string using DataFrame.replace() method.
df2 = df.replace('Py','Python with ', regex=True)

# Replace pattern of string using regular expression.
df2 = df.replace({'Courses': 'Py', 'Duration': 'days'}, 
    {'Courses': 'Python with', 'Duration': ' Days'}, regex=True)

# Replace pattern of string using regular expression.
df2=df.replace(regex=['Language'],value='Lang')

# By using str.replace()
df['Courses'] = df['Courses'].str.replace('Language','Lang')

# Replace String using apply() function with lambda.
df2 = df.apply(lambda x: x.replace({'Py':'Python with', 'Language':'Lang'}, regex=True))

Now, let’s create a pandas DataFrame with a few rows and columns, execute these examples and validate results. Our DataFrame contains column names CoursesFee and Duration.


# Create a pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Spark","Java Language","PySpark","PHP Language"],
    'Fee' :[22000,25000,23000,24000,26000,27000],
    'Duration':['30days','50days','30days','60days','35days','30days']
          }
df = pd.DataFrame(technologies)
print(df)

Yields below output.


# Output:
         Courses    Fee Duration
0          Spark  22000   30days
1        PySpark  25000   50days
2          Spark  23000   30days
3  Java Language  24000   60days
4        PySpark  26000   35days
5   PHP Language  27000   30days

2. pandas Replace String Example

You can replace the string of pandas DataFrame column with another string by using DataFrame.replace() method. This method updates the specified value with another specified value and returns a new DataFrame. In order to update on existing DataFrame use inplace=True


# Replace string using DataFrame.replace() method.
df2 = df.replace('PySpark','Python with Spark')
print(df2)

Yields below output. This example replaces the string PySpark with Python with Spark.


# Output:
             Courses    Fee Duration
0              Spark  22000   30days
1  Python with Spark  25000   50days
2              Spark  23000   30days
3      Java Language  24000   60days
4  Python with Spark  26000   35days
5       PHP Language  27000   30days

3. Replace Multiple Strings

Now let’s see how to replace multiple strings column(s), In this example, I will also show how to replace part of the string by using regex=True param. To update multiple string columns, use the dict with key-value pair. The below example updates Py with Python with on Courses column and days with Days on Duration column.


# Replace pattern of string using regular expression.
df2 = df.replace({'Courses': 'Py', 'Duration': 'days'}, 
    {'Courses': 'Python with ', 'Duration': ' Days'}, regex=True)
print(df2)

Yields below output.


# Output:
             Courses    Fee Duration
0              Spark  22000  30 Days
1  Python with Spark  25000  50 Days
2              Spark  23000  30 Days
3      Java Language  24000  60 Days
4  Python with Spark  26000  35 Days
5       PHP Language  27000  30 Days

4. Replace Pattern of String Using Regular Expression

Using regular expression you can replace the matching string with another string in pandas DataFrame. The below example find string Language and replace it with Lan.


# Replace pattern of string using regular expression.
df2=df.replace(regex=['Language'],value='Lang')
print(df2)

Yields below output.


# Output:
     Courses    Fee Duration
0      Spark  22000   30days
1    PySpark  25000   50days
2      Spark  23000   30days
3  Java Lang  24000   60days
4    PySpark  26000   35days
5   PHP Lang  27000   30days

5. Using str.replace() on DataFrame

Alternatively, use str.replace() to replace a string, repalce() looks for exact matches unless you pass a regex pattern and param regex=True.


# By using str.replace()
df['Courses'] = df['Courses'].str.replace('Language','Lang')
print(df)

Yields same output as above. Note that this replaces the value on the Courses column on the existing DataFrame object.

6. Replace String Using apply() function with lambda

In this section, you can find out how to replace string using DataFrame.apply() with lambda expression. The apply() method allows you to apply a function along one of the axis of the DataFrame, default 0, which is the index (row) axis.


# Replace String using apply() function with lambda.
df2 = df.apply(lambda x: x.replace({'Py':'Python with', 'Language':'Lang'}, regex=True))
print(df2)

Yields below output.


# Output:
            Courses    Fee Duration
0             Spark  22000   30days
1  Python withSpark  25000   50days
2             Spark  23000   30days
3         Java Lang  24000   60days
4  Python withSpark  26000   35days
5          PHP Lang  27000   30days

7. Complete Example of Replace String in DataFrame


# Create a pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Spark","P","PySpark","P"],
    'Fee' :[22000,25000,23000,24000,26000,27000],
    'Duration':['30days','50days','30days','60days','35days','30days']
          }
df = pd.DataFrame(technologies)
print(df)

# Replace string using DataFrame.replace() method.
df2 = df.replace('Py','Python with ', regex=True)
print(df2)

# Replace pattern of string using regular expression.
df2 = df.replace({'Courses': 'Py', 'Duration': 'days'}, 
    {'Courses': 'Python with', 'Duration': ' Days'}, regex=True)
print(df2)

# Replace pattern of string using regular expression.
df2=df.replace(regex=['Language'],value='Lang')
print(df2)

# By using str.replace()
df['Courses'] = df['Courses'].str.replace('Language','Lang')
print(df)

# Replace String using apply() function with lambda.
df2 = df.apply(lambda x: x.replace({'Py':'Python with', 'Language':'Lang'}, regex=True))
print(df2)

8. Conclusion

In this article, You have learned how to replace the string in pandas column by using DataFrame.replace() and str.replace() with lambda function with some examples.

References

Leave a Reply

You are currently viewing How to Replace String in pandas DataFrame