In pandas, to replace a string in the DataFrame column, you can use either the replace()
function or the str.replace()
method along with lambda
methods.
In this article, I will explain how to replace strings in a Pandas offers several methods for replacing strings within DataFrame columns. The primary method is to use the replace()
function.
Key Points –
- Utilize the
replace()
function in Pandas to substitute specific strings with other strings within DataFrame columns. - Specify the target string(s) and replacement string(s) as arguments within the
replace()
function. - Using regular expressions for handling complex string replacement patterns effectively.
- Ensure to assign the modified DataFrame back to the original DataFrame or a new variable to retain changes.
- Ensure to use the
inplace=True
parameter to modify the DataFrame in place or assign the result to a new DataFrame if needed.
Quick Examples of Replacing String
Following are quick examples of replacing a string.
# Quick examples to replace string
# Example 1: Replace string
# Using DataFrame.replace() method
df2 = df.replace('Py','Python with ', regex=True)
# Example 2: Replace pattern of string
# Using regular expression.
df2 = df.replace({'Courses': 'Py', 'Duration': 'days'},
{'Courses': 'Python with', 'Duration': ' Days'}, regex=True)
# Example 3: Replace pattern of string
# Using regular expression
df2=df.replace(regex=['Language'],value='Lang')
# Example 4: By using str.replace()
df['Courses'] = df['Courses'].str.replace('Language','Lang')
# Example 5: Replace String
# Using apply() function with lambda
df2 = df.apply(lambda x: x.replace({'Py':'Python with', 'Language':'Lang'}, regex=True))
Now, let’s create a DataFrame with a few rows and columns, execute these examples, and validate results. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
.
# Create a pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Spark","Java Language","PySpark","PHP Language"],
'Fee' :[22000,25000,23000,24000,26000,27000],
'Duration':['30days','50days','30days','60days','35days','30days']
}
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)
Yields below output.
Pandas Replace String Example
You can utilize the DataFrame.replace() function to replace strings within a Pandas DataFrame column. This function updates the specified value with another specified value and returns a new DataFrame. In order to update on existing DataFrame use inplace=True
.
# Replace string using DataFrame.replace() method.
df2 = df.replace('PySpark','Python with Spark')
print("After replacing the string values of a single column:\n", df2)
In the above example, you create a DataFrame df
with columns Courses
, Fee
, and Duration
. Then you use the DataFrame.replace()
method to replace PySpark
with Python with Spark
in the Courses
column. This example yields the below output.
Replace Multiple Strings
Now let’s see how to replace multiple string column(s), In this example, I will also show how to replace part of the string by using regex=True
param. To update multiple string columns, use the dict with a key-value pair. The below example updates Py
with Python
with on Courses
column and days
with Days
on Duration
column.
# Replace pattern of string using regular expression.
df2 = df.replace({'Courses': 'Py', 'Duration': 'days'},
{'Courses': 'Python with ', 'Duration': ' Days'}, regex=True)
print("After replacing the string values of multiple columns:\n", df2)
Yields below output.
# Output:
# After replacing the string values of multiple columns
Courses Fee Duration
0 Spark 22000 30 Days
1 Python with Spark 25000 50 Days
2 Spark 23000 30 Days
3 Java Language 24000 60 Days
4 Python with Spark 26000 35 Days
5 PHP Language 27000 30 Days
Using Regular Expression
Alternatively, using regular expressions you can replace matching strings with other strings within a Pandas DataFrame. The below example finds a string Language
and replace it with Lan
.
# Replace pattern of string using regular expression
df2=df.replace(regex=['Language'],value='Lang')
print("After replacing the string values of a single column:\n", df2)
Yields below output.
# Output:
# After replacing the string values of a single column:
Courses Fee Duration
0 Spark 22000 30days
1 PySpark 25000 50days
2 Spark 23000 30days
3 Java Lang 24000 60days
4 PySpark 26000 35days
5 PHP Lang 27000 30days
Use str.replace() Function
To use str.replace()
on a DataFrame, you would first access the column containing the strings you want to replace using square brackets ([]
), then apply the str.replace()
method.
# By using str.replace()
df['Courses'] = df['Courses'].str.replace('Language','Lang')
print("After replacing the string values of a single column:\n", df)
In the above example, you create a DataFrame df
with a column Courses
containing strings. Then, you use the str.replace()
method directly on the Courses
column of the DataFrame to replace Language
with Lang
.
Using apply() Function Along with Lambda
Similarly, you can use DataFrame.apply() with a lambda expression to replace strings. The apply() method in Pandas permits the application of a method along one of the DataFrame axes. By default, the axis is set to 0, representing the index (row) axis.
# Replace String using apply() function with lambda
df2 = df.apply(lambda x: x.replace({'Py':'Python with', 'Language':'Lang'}, regex=True))
print("After replacing the string values of a single column:\n", df2)
Yields below output.
# Output:
# After replacing the string values of a single column:
Courses Fee Duration
0 Spark 22000 30days
1 Python withSpark 25000 50days
2 Spark 23000 30days
3 Java Lang 24000 60days
4 Python withSpark 26000 35days
5 PHP Lang 27000 30days
Complete Example
# Create a pandas DataFrame
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Spark","P","PySpark","P"],
'Fee' :[22000,25000,23000,24000,26000,27000],
'Duration':['30days','50days','30days','60days','35days','30days']
}
df = pd.DataFrame(technologies)
print(df)
# Replace string using DataFrame.replace() method
df2 = df.replace('Py','Python with ', regex=True)
print(df2)
# Replace pattern of string using regular expression
df2 = df.replace({'Courses': 'Py', 'Duration': 'days'},
{'Courses': 'Python with', 'Duration': ' Days'}, regex=True)
print(df2)
# Replace pattern of string using regular expression.
df2=df.replace(regex=['Language'],value='Lang')
print(df2)
# By using str.replace()
df['Courses'] = df['Courses'].str.replace('Language','Lang')
print(df)
# Replace String using apply() function with lambda
df2 = df.apply(lambda x: x.replace({'Py':'Python with', 'Language':'Lang'}, regex=True))
print(df2)
FAQs on Replace String
Replacing strings in a DataFrame is often necessary for data cleaning and preprocessing tasks. It allows you to standardize values, correct errors, or prepare the data for analysis.
You can apply a custom function to replace strings in a DataFrame column using the apply()
method.
You can replace multiple strings in a DataFrame column simultaneously using the replace()
method with a dictionary.
You can replace multiple strings simultaneously by providing lists of strings to replace and their corresponding replacement values to the .replace()
method.
Conclusion
In this article, you have learned to replace the string in the Pandas column by using DataFrame.replace()
and str.replace()
with lambda
function.
Related Articles
- Convert Date (datetime) to String Format
- Pandas Filter DataFrame Rows on Dates
- Pandas Groupby Columns and Get Count
- Handle Missing Data in DataFrame
- How to Reshape Pandas Series?
- pandas replace values based on condition
- Pandas Replace substring in DataFrame
- Replace NaN Values with Zero in a Column