In pandas, to replace a string in the DataFrame column, you can use either the replace()
function or the str.replace()
method along with lambda
functions. In this article, I will explain how to replace the string in pandas DataFrame.
Key Points –
- Use the
str.replace()
method in pandas to replace strings in DataFrame columns efficiently. - Specify the string to be replaced and its replacement within the method parameters.
- Employ regular expressions for handling complex string replacement patterns effectively.
- Ensure to assign the modified DataFrame back to the original DataFrame or a new variable to retain changes.
Quick Examples to Replace String
If you are in a hurry below are some examples of how to replace a string in Pandas DataFrame.
# Quick examples to replace string
# Example 1: Replace string
# Using DataFrame.replace() method
df2 = df.replace('Py','Python with ', regex=True)
# Example 2: Replace pattern of string
# Using regular expression.
df2 = df.replace({'Courses': 'Py', 'Duration': 'days'},
{'Courses': 'Python with', 'Duration': ' Days'}, regex=True)
# Example 3: Replace pattern of string
# Using regular expression
df2=df.replace(regex=['Language'],value='Lang')
# Example 4: By using str.replace()
df['Courses'] = df['Courses'].str.replace('Language','Lang')
# Example 5: Replace String
# Using apply() function with lambda
df2 = df.apply(lambda x: x.replace({'Py':'Python with', 'Language':'Lang'}, regex=True))
To run some examples of the replace string in pandas DataFrame, let’s create a DataFrame with a few rows and columns, execute these examples, and validate results.
# Create a pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Spark","Java Language","PySpark","PHP Language"],
'Fee' :[22000,25000,23000,24000,26000,27000],
'Duration':['30days','50days','30days','60days','35days','30days']
}
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)
Yields below output.
Pandas Replace String Example
You can replace strings within a Pandas DataFrame column using the DataFrame.replace() function. This function updates the specified value with another specified value and returns a new DataFrame. In order to update on existing DataFrame use inplace=True
# Replace string using DataFrame.replace() method.
df2 = df.replace('PySpark','Python with Spark')
print("After replacing the string values of a single column:\n", df2)
In the above example, you create a DataFrame df
with columns Courses
, Fee
, and Duration
. Then you use the DataFrame.replace()
method to replace PySpark
with Python with Spark
in the Courses
column. This example yields the below output.
Replace Multiple Strings
Now let’s see how to replace multiple string column(s), In this example, I will also show how to replace part of the string by using regex=True
param. To update multiple string columns, use the dict with a key-value pair. The below example updates Py
with Python
with on Courses
column and days
with Days
on Duration
column.
# Replace pattern of string using regular expression.
df2 = df.replace({'Courses': 'Py', 'Duration': 'days'},
{'Courses': 'Python with ', 'Duration': ' Days'}, regex=True)
print("After replacing the string values of multiple columns:\n", df2)
Yields below output.
# Output:
# After replacing the string values of multiple columns
Courses Fee Duration
0 Spark 22000 30 Days
1 Python with Spark 25000 50 Days
2 Spark 23000 30 Days
3 Java Language 24000 60 Days
4 Python with Spark 26000 35 Days
5 PHP Language 27000 30 Days
Using Regular Expression
Alternatively, using regular expressions you can replace matching strings with other strings within a Pandas DataFrame. The below example finds a string Language
and replace it with Lan
.
# Replace pattern of string using regular expression
df2=df.replace(regex=['Language'],value='Lang')
print("After replacing the string values of a single column:\n", df2)
Yields below output.
# Output:
# After replacing the string values of a single column:
Courses Fee Duration
0 Spark 22000 30days
1 PySpark 25000 50days
2 Spark 23000 30days
3 Java Lang 24000 60days
4 PySpark 26000 35days
5 PHP Lang 27000 30days
Using str.replace() on DataFrame
You can use the str.replace()
method directly on a DataFrame column to replace strings, repalce()
looks for exact matches unless you pass a regex pattern and param regex=True
.
# By using str.replace()
df['Courses'] = df['Courses'].str.replace('Language','Lang')
print("After replacing the string values of a single column:\n", df)
In the above example, you create a DataFrame df
with a column Courses
containing strings. Then, you use the str.replace()
method directly on the Courses
column of the DataFrame to replace Language
with Lang
.
Using apply() Function Along with lambda
Similarly, you can use DataFrame.apply() with a lambda expression to replace strings. The apply() method allows you to apply a function along one of the axes of the DataFrame, by default 0, which is the index (row) axis.
# Replace String using apply() function with lambda.
df2 = df.apply(lambda x: x.replace({'Py':'Python with', 'Language':'Lang'}, regex=True))
print("After replacing the string values of a single column:\n", df2)
Yields below output.
# Output:
# After replacing the string values of a single column:
Courses Fee Duration
0 Spark 22000 30days
1 Python withSpark 25000 50days
2 Spark 23000 30days
3 Java Lang 24000 60days
4 Python withSpark 26000 35days
5 PHP Lang 27000 30days
Complete Example
# Create a pandas DataFrame
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Spark","P","PySpark","P"],
'Fee' :[22000,25000,23000,24000,26000,27000],
'Duration':['30days','50days','30days','60days','35days','30days']
}
df = pd.DataFrame(technologies)
print(df)
# Replace string using DataFrame.replace() method
df2 = df.replace('Py','Python with ', regex=True)
print(df2)
# Replace pattern of string using regular expression
df2 = df.replace({'Courses': 'Py', 'Duration': 'days'},
{'Courses': 'Python with', 'Duration': ' Days'}, regex=True)
print(df2)
# Replace pattern of string using regular expression.
df2=df.replace(regex=['Language'],value='Lang')
print(df2)
# By using str.replace()
df['Courses'] = df['Courses'].str.replace('Language','Lang')
print(df)
# Replace String using apply() function with lambda
df2 = df.apply(lambda x: x.replace({'Py':'Python with', 'Language':'Lang'}, regex=True))
print(df2)
Frequently Asked Questions on Replace String in DataFrame
To replace a specific string in a DataFrame column with another string, you can use the str.replace()
method.
You can replace multiple strings in a DataFrame column simultaneously using the replace()
method with a dictionary.
To replace strings based on a pattern or using regular expressions in a Pandas DataFrame, you can use the str.replace()
method with the regex=True
parameter.
You can apply a custom function to replace strings in a DataFrame column using the apply()
method.
Conclusion
In this article, You have learned how to replace the string in the Pandas column by using DataFrame.replace() and str.replace()
with lambda
function with some examples.
Related Articles
- Change String Object to Date in Pandas DataFrame
- Count(Distinct) SQL Equivalent in Pandas DataFrame
- Convert Date (datetime) to String Format
- Pandas Filter DataFrame Rows on Dates
- Pandas Groupby Columns and Get Count
- Pandas Handle Missing Data in Dataframe
- How to Reshape Pandas Series?
- Pandas Replace Column value in DataFrame
- pandas replace values based on condition
- Pandas Replace substring in DataFrame
- Pandas Replace Blank Values (empty) with NaN
- Pandas – Replace NaN Values with Zero in a Column