How to replace special characters in Python using regex? As you are working with strings, you might find yourself in a situation where you want to replace some special characters in it. With Python regex, you can search the string for special characters and be able to replace them.
In this tutorial, we will explore the power and capability of the re
module which is used to search and replace special characters in a string.
Below are the topics will cover in this article.
- Identifying Special Characters
- Using
re.sub()
for Replacing Special Characters - Replacing Special Characters with an Empty String
- Handling Escaped Special Characters
- Removing Punctuation Marks
- Conclusion
Identifying Special Characters
Before we dive deep into how to replace special characters in our strings by using Python regex (re module), let us understand what these special characters are. Special characters are non-alphanumeric characters that have a special meaning or function in text processing. The examples include symbols like !
, @
, #
, $
, %
, and many others. You get to understand the usage of these symbols in the following sections.
Using Regex re.sub()
to Replace Special Characters
To replace special characters we use the re.sub()
method. Here is an example to demonstrate the usage of the re.sub()
method in replacing special characters:
import re
# the original text
text = "Do you love programming? Learn programming @ SparkByExamples.com at just $40"
# replace the special characters '@', '!', '#', '$', '%', '^', '&', '*', '?', '(' and ')' with the word 'at'
new_text = re.sub(r"[@!#$%^&*?()]", "at", text)
# print the modified text
print(new_text)
In this code, we use re.sub()
to replace the special characters @
, !
, #
, $
, %
, ^
, &
, *
, ?
, (
, and )
with the word "at"
.
The regular expression pattern [@!#$%^&*?()]
matches any of the specified special characters within the square brackets. We provide the replacement string "at"
as the second argument to re.sub(
), which replaces all occurrences of the matched special characters with "at"
in the text.
Output:
#Output
Do you love programmingat Learn programming at SparkByExamples.com at just at40
Replacing Special Characters with an Empty String
There might be scenarios where you just want to replace the special characters with nothing but an empty string. To do that you just have to pass an empty string as a replacement text in the re.sub()
method. Here is an example:
import re
# the original text
text = "Do you love programming? Learn programming @ SparkByExamples.com at just $40"
# remove the special characters '@', '!', '#', '$', '%', '^', '&', '*', '?', '(' and ')' by replacing them with an empty string
new_text = re.sub(r"[@!#$%^&*?()]", "", text)
# print the modified text
print(new_text)
In this code, we use re.sub()
to replace the special characters @
, !
, #
, $
, %
, ^
, &
, *
, ?
, (
, and )
with an empty string, effectively removing them from the text.
The regular expression pattern <strong>[@!#$%^&*?()]</strong>
matches any of the specified special characters within the square brackets. We provide an empty string <strong>""</strong>
as the second argument to re.sub()
, indicating that we want to replace the matches with nothing, effectively removing them from the text.
Output:
#Output
Do you love programming Learn programming SparkByExamples.com at just 40
Handling Escaped Special Characters
It is important to know that special characters have special meanings and when handling them you need to be extra careful. Let us look at this example:
import re
# the original text
text = "Learn programming at SparkBy\. It's awesome!"
# define the replacement string
replacement = "Examples"
# replace the special characters '\', '$', and '?' with the replacement string
new_text = re.sub(r"\\|[$?]", replacement, text)
# print the modified text
print(new_text)
In the code snippet, we first of all, define the original text to be modified. Notice that the backslash \
is escaped with another backslash \
to preserve its literal meaning.
We then define the replacement string to be used when replacing the special characters.
The re.sub()
method is then called to perform the replacement operation. The first argument is the regular expression pattern r"\\|[$?]"
, which matches either a backslash \
, a dollar sign $
, or a question mark ?
. The double backslash \\
is used to escape the backslash character in the regular expression pattern. The second argument is the replacement string replacement. The third argument is the original text text
to search for matches.
Finally, we assign the modified text to the variable new_text
.
Output:
#Output
Learn programming at SparkByExamples. It's awesome!
Removing Punctuation Marks
The re.sub()
method can also come in handy when you want to remove punctuation marks from a string. Below are some of the punctuation marks that you can use:
Punctuation mark | Symbol |
Question mark | ? |
Exclamation mark | ! |
Colon | : |
Semicolon | ; |
Quotation marks | “” |
Apostrophe | ‘ |
Parentheses | () |
Hyphen | – |
Comma | , |
Here is an example:
import re
# the original text
text = "Do you love programming? Yes! And i'm learning at sparkbyexamples, thank you."
# remove punctuation marks using regex
new_text = re.sub(r"[^\w\s]", "", text)
# print the modified text
print(new_text)
In this code, we use the re.sub()
function to remove punctuation marks from the text. The regular expression pattern [^\w\s]
matches any character that is not a word character (<strong>\w</strong>)
or a whitespace character (<strong>\s</strong>)
. By replacing these non-word and non-space characters with an empty string, we effectively remove the punctuation marks from the text.
Output:
#Output
Do you love programming Yes And im learning at sparkbyexamples thank you
Conclusion
That concludes this tutorial. Just a recap we have explored the re.sub()
method capability to replace special characters in a string. We hope that this knowledge gained is useful and that you will use it in your future Python regex projects.