With Python regex, everything seems to be possible, from matching a string to replacing that string. Another best scenario where Python regex comes in handy is when you want to find a substring in a string. Here you try to extract a small portion of a string from a very long string using the two popular Pythonregex methods, re.search()
and re.findall()
.
In this tutorial, we will explore Python regex capability that helps you find a substring in a very long string. Here is what you will learn:
- What is a Substring?
- Using
re.search()
for Substring Matching - Using
re.findall()
for Multiple Substring Matches - Using Capture Groups to Extract Substrings
- Conclusion
What is a Substring?
Before diving deep into finding substrings and similar, let us first of all, understand what a substring is. A substring is a contiguous sequence of characters within a larger string. It represents a smaller portion of the original string. For example, in the string "I love SparkByExamples"
, the substrings can be "love"
, "SparkBy"
, "ByExamples"
, "Examples"
, etc.
Using re.search() for Substring Matching
When searching for a string the re.search()
method is another powerful method to go to. The method works by searching the first occurrence of a substring within an original string and it returns a match object if the substring is found. Here is an example that demonstrates substring matching:
import re
# the original text
text = "Hello! This is SparkByExamples.com"
# the substring to search for
substring = "Examples.com"
# use re.search() to find the first occurrence of the substring within the text
match = re.search(substring, text)
if match:
# if a match is found, print a message indicating the substring was found
print(f"'{substring}' is a substring of '{text}'")
else:
# if no match is found, print a message indicating the substring was not found
print(f"Substring not found!")
In this code snippet, we define the original text as "Hello! This is SparkByExamples.com"
.
Then we specify the substring to search for as “Examples.com”. We use the re.search()
method to find the first occurrence of the substring within the text. Then we assign the result of the search to the match variable.
Finally, we check if a match is found by evaluating the match variable using an if statement.
Output:
#Output
'Examples.com' is a substring of 'Hello! This is SparkByExamples.com'
Using re.findall() for Multiple Substring Matches
Another Python regex method you can use for finding a substring is the re.findall()
method. You mainly use it to find multiple substring matches. Here is an example:
import re
# the original text
text = "Hello! This is spark by example. It contains alot of coding examples."
# the substring pattern to search for
substring_pattern = r'\bex\w+'
# find all occurrences of substrings matching the pattern in the text
matches = re.findall(substring_pattern, text)
if matches:
# if matches are found, print all the matched substrings
print("Matched substrings:")
for match in matches:
print(match)
else:
# if no matches are found, print a message indicating no matches were found
print("No matches found.")
Here, we define the original text as "Hello! This is an example text. It contains multiple examples."
Then we specify the substring pattern to search for using the regular expression pattern r'\bex\w+'
. This pattern matches words starting with "ex"
.
Using the re.findall()
method, we are finding all occurrences of substrings that match the pattern in the text. The method returns a list of all matches. Then we assign the result of the search to the matches variable.
Finally, we are checking if any matches are found by evaluating the matches variable using an if
statement. If matches are found, iterate over the matches list and print each matched substring. If no matches are found, print a message indicating that no matches were found.
Output:
#Output
Matched substrings:
example
examples
Using Capture Groups to Extract Substrings
There might be scenarios where you want to extract a substring from a text. A solution to this is using capture groups. Let us look at an example that demonstrates this:
import re
# the original text
text = "My favorite programming languages are python, java, and kotlin."
# the pattern with capture groups to extract languages
pattern = r"(python|java|kotlin)"
# find all occurrences of languages in the text
matches = re.findall(pattern, text)
if matches:
# if matches are found, print all the captured substrings
print("Extracted languages:")
for match in matches:
print(match)
else:
# if no matches are found, print a message indicating no matches were found
print("No fruits found.")
In this example, we define the original text as "My favorite programming languages are python, java, and kotlin."
Then we are also specifying the pattern with capture groups to extract programming languages. In this case, the pattern is (python|java|kotlin)
, which captures the words "python"
, "java"
, and "kotlin"
.
We are using the re.findall()
method to find all occurrences of substrings that match the pattern in the text. The function returns a list of all matches.
Then we are assigning the result of the search to the matches variable.
Finally, we are checking if any matches are found by evaluating the matches variable using an if
statement. If matches are found, iterate over the matches list and print each captured substring, which represents a programming language. If no matches are found, print a message indicating that no matches were found.
Output:
#Output
Extracted languages:
python
java
kotlin
Conclusion
This concludes this tutorial. In this tutorial, we explored how to find substrings using the two popular Python regex methods, the re.search(
) and re.findall()
.