Python regex match word in string

Regular expressions (regex) in Python are a robust toolset that empowers developers to perform intricate pattern matching and string searching operations. One frequent use case involves matching a particular word within a larger string, allowing for precise information retrieval.

Basic Syntax for Matching Words

Matching specific words within a larger string is a common task when working with regular expressions (regex) in Python. Understanding the basic syntax for word matching is essential for effective text processing and pattern matching. In this section, we will explore the fundamental syntax for matching words using Python’s regex module. The table below explains in detail the syntax elements for matching words:

Syntax element	Description
Word Boundary	Denoted by the metacharacter `‘\b’`. It represents a position between a word character (typically a letter, digit, or underscore) and a non-word character (anything other than a word character).
Word Character	Denoted by the metacharacter `‘\w’`. It matches any word character, including letters, digits, and underscores. They are equivalent to the character class `[a-zA-Z0-9_]`.
Quantifiers	Allow you to specify the number of times a particular element should occur e.g. *, +, ?, {n}, {n, }, {n, m}

Syntax elements for matching words

Understanding Word Boundaries

Having looked at the syntax elements for matching words, let us now understand what word boundaries are in regex. Word boundaries represent positions in the text where a word character (\w) is adjacent to a non-word character (\W). In other words, they mark the boundaries between words and non-word characters. For better understanding let us look at an example:


import re

text = "If there is an online platform for sharpening your coding skills then SparkByExamples.com is the one."

# match the word 'coding' using word boundaries
pattern = r"\bcoding\b"

matches = re.findall(pattern, text)
print(matches)

Here is a breakdown of the code snippet. A string variable text is defined, then a pattern r"\bcoding\b" is created. The \b represents a word boundary and ensures that the word “coding” is matched only as a complete word. It prevents matching partial occurrences within larger words.

The re.findall() method is called with the pattern and the text as arguments. This method searches for all non-overlapping occurrences of the pattern in the given text and returns them as a list.

The resulting list of matches is stored in the variable matches.

Output:


#Output
['coding']

Matching Whole Words

As you are working with the re module, there might be scenarios where you only want to match a specific word within a string. For this task, you can use the \b metacharacter at the start and end of your pattern. Here is an example that demonstrates this:


import re

# the string to search the match in
text = "Spark, Sparkly and Sparky are the same words"

# match the word 'cat' as a whole word
pattern = r"\bSpark\b"

# performing the match
matches = re.findall(pattern, text)
print(matches)

In this example, a string variable text is defined, containing the sentence “Spark, Sparkly and Sparky are the same words.” This is the text in which we want to search for matches.

The pattern r”\bSpark\b” is defined. It uses \b to specify word boundaries, ensuring that the word “Spark” is matched only as a complete word. It will not match occurrences of “Spark” within other words like “Sparkly” or “Sparky”.

The re.findall() method is called with the pattern and the text as arguments. This method searches for all non-overlapping occurrences of the pattern in the given text and returns them as a list.

The resulting list of matches is stored in the variable matches.

Output:


#Output
['Spark']

Matching Case-Insensitive Words

In situations, where you want to match case-insensitive words you can use the re.IGNORECASE flag or the re.I flag. This flag allows the regex pattern to match words regardless of their case. Here is an example:


import re

# the string to search the match in
text = "At SparkByExamples.com you learn to CODE by coding"

# match the word 'CODE' case-insensitively
pattern = r"CODE"

matches = re.findall(pattern, text, re.IGNORECASE)
print(matches)

In the code, we define the string to search for matches as "At SparkByExamples.com you learn to CODE by coding".

Then we define the pattern to match as “CODE”. After that, we used re.findall() to find all occurrences of the pattern in the text. The function returns a list of all matches.

The re.IGNORECASE flag is used as an optional argument to make the pattern-matching case insensitive. This means it will match both "CODE" and "code" in the text.

Output:


#Output
['CODE']

Matching Multiple Occurrences

If you want to match multiple occurrences then the re.findall() is the method to use. This method searches for all non-overlapping occurrences of a pattern in a given text and returns them as a list. An example would look like this:


import re

# the string to search the match in
text = "If there is an online platform for sharpening your coding skills then SparkByExamples.com is the one."

# match all the occurences of the word 'is' 
pattern = r"is"

matches = re.findall(pattern, text)
print(matches)

In the code, the pattern is is defined, which represents the literal string "is" that we want to match in the given text. Note that the pattern is case-sensitive, so it will only match the lowercase "is" in the text.

Output:


#Output
['is', 'is']

Conclusion

That concludes our tutorial. You have learned how to do word matching in regex and we hope this knowledge gained will be useful in your future Python regex projects.