You can find out how to replace substring in a column of pandas DataFrame by using DataFrame.replace() with lambda
functions. In this article, I will explain how to replace the substring in the DataFrame column with multiple examples.
- Replace a substring with another substring in pandas.
- Replace a pattern of substring with another substring using regular expression.
1. Quick Examples to Replace Substring
If you are in hurry below are some quick examples of how to replace a substring in a column of pandas DataFrame.
# Below are some quick examples.
# Replace substring
df2 = df.replace('Py','Python with ', regex=True)
# Replace substring
df2 = df.replace('Py','Python with ', regex=True)
# Replace multiple substrings
df2 = df.replace({'Courses': 'Py', 'Duration': 'days'},
{'Courses': 'Python with', 'Duration': ' Days'}, regex=True)
# Replace pattern of Substring using regular expression.
df2=df.replace(regex=['Language'],value='Lang')
# Using str.replace()
df['Courses'] = df['Courses'].str.replace('Language','Lang')
# Replace SubString using apply() function with lambda.
df2 = df.apply(lambda x: x.replace(
{'Py':'Python with', 'Language':'Lang'},
regex=True))
Now, Let’s create a pandas DataFrame with a few rows and columns, execute these examples and validate results. Our DataFrame contains column names Courses
, Fee
and Duration
.
# Create a pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Spark","Java Language","PySpark","PHP Language"],
'Fee' :[22000,25000,23000,24000,26000,27000],
'Duration':['30days','50days','30days','60days','35days','30days']
}
df = pd.DataFrame(technologies)
print(df)
Yields below output.
Courses Fee Duration
0 Spark 22000 30days
1 PySpark 25000 50days
2 Spark 23000 30days
3 Java Language 24000 60days
4 PySpark 26000 35days
5 PHP Language 27000 30days
2. Replace Substring Using replace() Method
You can replace substring of pandas DataFrame column by using DataFrame.replace() method. This method by default finds the exact sting match and replaces it with the specified value. Use regex=True
to replace substring.
# Replace substring
df2 = df.replace('Py','Python with ', regex=True)
print(df2)
Yields below output. The above example replaced the substring value Py
with Python
on column Courses
.
Courses Fee Duration
0 Spark 22000 30days
1 Python with Spark 25000 50days
2 Spark 23000 30days
3 Java Language 24000 60days
4 Python with Spark 26000 35days
5 PHP Language 27000 30days
This method returns a new DataFrame after replacing substring. Use inplace=True to replace on existing DataFrame object.
3. Replace Multiple Substrings
Let’s see how to replace substring on multiple columns, in order to do this I will be using dict with column names and values to replace.
# Replace multiple substrings
df2 = df.replace({'Courses': 'Py', 'Duration': 'days'},
{'Courses': 'Python with', 'Duration': ' Days'}, regex=True)
print(df2)
Yields below output.
Courses Fee Duration
0 Spark 22000 30 Days
1 Python withSpark 25000 50 Days
2 Spark 23000 30 Days
3 Java Language 24000 60 Days
4 Python withSpark 26000 35 Days
5 PHP Language 27000 30 Days
4. Replace Pattern of Substring Using Regular Expression
Using regular expression you can replace the matching substring with another string. The below examples find string Language
and replace it with Lang
.
# Replace pattern of Substring using regular expression.
df2=df.replace(regex=['Language'],value='Lang')
print(df2)
Yields below output.
Courses Fee Duration
0 Spark 22000 30days
1 PySpark 25000 50days
2 Spark 23000 30days
3 Java Lang 24000 60days
4 PySpark 26000 35days
5 PHP Lang 27000 30days
5. By using str.replace() on DataFrame
Alternatively, use str.replace()
to replace a substring, repalce()
looks for exact matches unless you pass a regex pattern and param regex=True
.
# By using str.replace()
df['Courses'] = df['Courses'].str.replace('Language','Lang')
print(df)
Yields same output as above. Note that this replaces the value on the Courses
column in the existing DataFrame object.
6. Replace Substring Using apply() function with lambda
In this section, You can find out how to replace substring using DataFrame.apply() and lambda
function. The apply() method allows you to apply a function along with one of the axis of the DataFrame. The below example replaces multiple substring’s.
# Replace SubString using apply() function with lambda.
df2 = df.apply(lambda x: x.replace({'Py':'Python with', 'Language':'Lang'}, regex=True))
print(df2)
Yields below output.
Courses Fee Duration
0 Spark 22000 30days
1 Python withSpark 25000 50days
2 Spark 23000 30days
3 Java Lang 24000 60days
4 Python withSpark 26000 35days
5 PHP Lang 27000 30days
7. Complete Example of pandas Replace Substring
# Create a pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Spark","Java Language","PySpark","PHP Language"],
'Fee' :[22000,25000,23000,24000,26000,27000],
'Duration':['30days','50days','30days','60days','35days','30days']
}
df = pd.DataFrame(technologies)
print(df)
# Replace substring
df2 = df.replace('Py','Python with ', regex=True)
print(df2)
# Replace substring
df2 = df.replace('Py','Python with ', regex=True)
print(df2)
# Replace multiple substrings
df2 = df.replace({'Courses': 'Py', 'Duration': 'days'},
{'Courses': 'Python with', 'Duration': ' Days'}, regex=True)
print(df2)
# Replace pattern of Substring using regular expression.
df2=df.replace(regex=['Language'],value='Lang')
print(df2)
# Using str.replace()
df['Courses'] = df['Courses'].str.replace('Language','Lang')
print(df)
# Replace SubString using apply() function with lambda.
df2 = df.apply(lambda x: x.replace(
{'Py':'Python with', 'Language':'Lang'},
regex=True))
print(df2)
Conclusion
In this article, You have learned how to replace substring of pandas DataFrame by using DataFrame.replace()
with lambda
functions with some examples.
Related Articles
- Change String Object to Date in Pandas DataFrame
- Count(Distinct) SQL Equivalent in Pandas DataFrame
- Convert Date (datetime) to String Format
- Pandas Filter DataFrame Rows on Dates
- Pandas Groupby Columns and Get Count
- Pandas Replace Values based on Condition
- Pandas Replace Column value in DataFrame
- Pandas Series.replace() – Replace Values