You can find how to replace substrings in a pandas DataFrame column using the replace()
method with lambda
functions. This versatile method allows you to replace substrings within the entries of a pandas DataFrame, either across the entire DataFrame or within specific columns. In this article, I will explain how to replace the substring in the DataFrame column with multiple examples.
Key Points –
- Replace a substring with another substring in Pandas.
- Replace a pattern of a substring with another substring using regular expression.
- Specify the column containing the target substrings within the DataFrame.
- Use the
str.replace()
method in Pandas to replace substrings within DataFrame columns. - Provide the substring to be replaced and the replacement string as arguments to the
str.replace()
method.
Related: You can replace the string in Pandas DataFrame.
Quick Examples to Replace Substring
Below are quick examples of replace substring in a column of pandas DataFrame.
# Quick examples to replace substring
# Example 1: Replace substring
df2 = df.replace('Py','Python with ', regex=True)
# Example 2: Replace substring
df2 = df.replace('Py','Python with ', regex=True)
# Example 3: Replace multiple substrings
df2 = df.replace({'Courses': 'Py', 'Duration': 'days'},
{'Courses': 'Python with', 'Duration': ' Days'}, regex=True)
# Example 4: Replace pattern of Substring
# Using regular expression.
df2=df.replace(regex=['Language'],value='Lang')
# Example 5: Using str.replace()
df['Courses'] = df['Courses'].str.replace('Language','Lang')
# Example 6: Replace SubString
# Using apply() function with lambda
df2 = df.apply(lambda x: x.replace(
{'Py':'Python with', 'Language':'Lang'},
regex=True))
To run some examples of replacing substrings in Pandas DataFrame, let’s create Pandas DataFrame using data from a dictionary.
# Create a pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Spark","Java Language","PySpark","PHP Language"],
'Fee' :[22000,25000,23000,24000,26000,27000],
'Duration':['30days','50days','30days','60days','35days','30days']
}
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)
Yields below output.
Replace Substring Using replace()
To replace substrings in a DataFrame using the DataFrame.replace() function. This function by default finds the exact string match and replaces it with the specified value. Use regex=True
to replace the substring.
# Replace substring
df2 = df.replace('Py','Python with ', regex=True)
print("After replacing the substring with another substring:\n", df2)
# Replacing substring 'Py' with 'Python with' using replace() function
df['Courses'] = df['Courses'].replace('Py', 'Python with ', regex=True)
print("After replacing the substring with another substring:\n", df)
Yields below output. The above example replaced the substring value Py
with Python
on column Courses
.
This method returns a new DataFrame after replacing the substring. Use inplace=True
to replace on existing DataFrame object.
Replace Multiple Substrings
Alternatively, you can apply this method to multiple string columns in a DataFrame and allows you to replace occurrences of substrings with other substrings. Let’s see how to replace substring on multiple columns, to do this I will be using dict with column names and values to replace.
# Replace multiple substrings
df2 = df.replace({'Courses': 'Py', 'Duration': 'days'},
{'Courses': 'Python with', 'Duration': ' Days'}, regex=True)
print("After replacing the multiple substrings:\n", df2)
Yields below output.
# Output:
# After replacing the multiple substrings:
Courses Fee Duration
0 Spark 22000 30 Days
1 Python withSpark 25000 50 Days
2 Spark 23000 30 Days
3 Java Language 24000 60 Days
4 Python withSpark 26000 35 Days
5 PHP Language 27000 30 Days
Replace Pattern of Substring
To replace a pattern of substring using regular expression in Python, you can utilize the str.replace()
method with the regex=True
parameter in pandas.
# Replace pattern of Substring using regular expression.
df2=df.replace(regex=['Language'],value='Lang')
print("After replacing the substring:\n", df2)
This code will replace the substring Language
with Lang
in the Courses
column of the DataFrame using regular expressions. The regex=True
parameter enables regular expression matching in the replace()
method.
# Output:
# After replacing the substring:
Courses Fee Duration
0 Spark 22000 30days
1 PySpark 25000 50days
2 Spark 23000 30days
3 Java Lang 24000 60days
4 PySpark 26000 35days
5 PHP Lang 27000 30days
Using str.replace() Function
Alternatively, when replacing a substring in a DataFrame column, you can utilize the str.replace()
method. By default, replace()
seeks exact matches unless you provide a regex pattern and set the parameter regex=True
.
# By using str.replace()
df['Courses'] = df['Courses'].str.replace('Language','Lang')
print("After replacing the substring:\n", df)
In this program, df['Courses'].str.replace('Language','Lang')
directly replaces the substring Language
with Lang
in the Courses
column using the str.replace()
method on the DataFrame. Note that this replaces the value on the Courses
column in the existing DataFrame object.
# Output:
# After replacing the substring:
Courses Fee Duration
0 Spark 22000 30days
1 PySpark 25000 50days
2 Spark 23000 30days
3 Java Lang 24000 60days
4 PySpark 26000 35days
5 PHP Lang 27000 30days
Replace Substring Using apply() Function with Lambda
In this section, You can find out how to replace the substring using DataFrame.apply() and lambda
function. The apply() function in Pandas enables you to apply a function along one of the axes of the DataFrame, be it rows or columns. The below example replaces multiple substrings.
# Replace SubString using apply() function with lambda.
df2 = df.apply(lambda x: x.replace({'Py':'Python with', 'Language':'Lang'}, regex=True))
print(df2)
Yields below output.
# Output:
Courses Fee Duration
0 Spark 22000 30days
1 Python withSpark 25000 50days
2 Spark 23000 30days
3 Java Lang 24000 60days
4 Python withSpark 26000 35days
5 PHP Lang 27000 30days
Complete Example of Pandas Replace Substring
# Create a pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Spark","Java Language","PySpark","PHP Language"],
'Fee' :[22000,25000,23000,24000,26000,27000],
'Duration':['30days','50days','30days','60days','35days','30days']
}
df = pd.DataFrame(technologies)
print(df)
# Replace substring
df2 = df.replace('Py','Python with ', regex=True)
print(df2)
# Replace substring
df2 = df.replace('Py','Python with ', regex=True)
print(df2)
# Replace multiple substrings
df2 = df.replace({'Courses': 'Py', 'Duration': 'days'},
{'Courses': 'Python with', 'Duration': ' Days'}, regex=True)
print(df2)
# Replace pattern of Substring using regular expression.
df2=df.replace(regex=['Language'],value='Lang')
print(df2)
# Using str.replace()
df['Courses'] = df['Courses'].str.replace('Language','Lang')
print(df)
# Replace SubString using apply() function with lambda.
df2 = df.apply(lambda x: x.replace(
{'Py':'Python with', 'Language':'Lang'},
regex=True))
print(df2)
FAQ on Replace Substring
You can replace a substring in a column of a DataFrame using various methods such as str.replace()
, apply()
with a lambda function, or replace()
method.
You can replace substrings conditionally based on their values using pandas’ str.replace()
method along with conditional statements.
The str.replace()
method is used specifically for string columns and replaces substrings within each string element of the column. On the other hand, the replace()
method is more general and can be used to replace values in any type of column, not just strings.
You can replace substrings using regular expressions in Pandas by setting the regex
parameter to True
in the str.replace()
method or providing a regex pattern in the replace()
method.
To replace substrings efficiently in a large DataFrame, it’s recommended to use vectorized operations such as str.replace()
or replace()
method with regex pattern if needed, as they are optimized for performance.
Conclusion
In conclusion, replacing substrings within a Pandas DataFrame is a common operation in data preprocessing and cleaning tasks. Throughout this article, we have explored various methods and techniques to accomplish this, focusing on the replace()
method along with lambda
functions for flexibility and customization.
Related Articles
- Change String Object to Date in Pandas DataFrame
- Count(Distinct) SQL Equivalent in Pandas DataFrame
- Convert Date (datetime) to String Format
- Pandas Filter DataFrame Rows on Dates
- Pandas Groupby Columns and Get Count
- Pandas Replace Values based on Condition
- Pandas Replace Column value in DataFrame
- pandas replace values based on condition
- Pandas Replace Blank Values (empty) with NaN
- Pandas Convert Column to Float in DataFrame