As you are working with the re module in Python for your text manipulation and analysis tasks, you will be using different data types that come with Python, such as strings, numbers, etc. One of these data types that you will use is the list, this is used to store a collection of different items. As you are working with lists, there might be situations where you want to search for specific patterns within the elements, filter elements based on patterns, or replace elements that match certain criteria.
In this tutorial, you will get to explore the power of Python regex when it comes to performing operations like pattern searching, filtering, and replacing Python lists.
Here is what we will cover in this tutorial:
- Searching for Patterns in a List
- Using
re.search()
- Using
re.findall()
- Using
- Filtering a List using Regex
re.match()
method - Replacing Elements in a List using Regex
re.sub()
method - Modifying Elements in Place
- Conclusion
Searching for Patterns in a List
As you are working with lists, you might want to search for certain patterns within the elements. This task can be achieved by using the two popular methods that come with the re
module. You can use either the re.search()
method or the re.match()
method.
Using re.search()
Let us first of all, look at how you can search for patterns within elements of a list using the re.search()
method. Here is an example to help you understand:
import re
# the list of languages to search in
languages = ["JavaScript", "Java", "Python", "C", "C++", "C#"]
# the pattern to search for
pattern = r"va"
# search for the pattern in each language
for language in languages:
if re.search(pattern, language):
# prints the language
print(f"Pattern found in: {language}")
The code is searching for a specific pattern within each element of the given list of programming languages. The pattern being searched for is defined as va
.
The code iterates through each language in the list and uses the re.search()
function to check if the pattern is present in the language string. If a match is found, it prints a message indicating that the pattern was found in the specific language.
Output:
#Output
'va' Pattern found in: JavaScript
'va' Pattern found in: Java
Using re.findall()
Another way for searching patterns within elements in a list is by using the re.findall()
method. Here is an example:
import re
# the list of languages to search in
languages = ["JavaScript", "Java", "Python", "C", "C++", "C#"]
# the pattern to search for
pattern = r"va"
# assigning the method return results to matches variable
matches = re.findall(pattern, " ".join(languages))
# prints the list of the matches
print(matches)
Here, the code snippet is searching for a specific pattern within the combined string representation of the given list of programming languages. The pattern being searched for is defined as va
, just like in the previous example.
The code joins the individual language strings in the list using a space separator, creating a single string. Then, it uses the re.findall()
function to find all occurrences of the pattern in that combined string. The matching results are stored in the matches
variable.
Output:
#Output
['va', 'va']
Filtering a List using Regex using re.match() method
With the help of regex and list comprehension, you can filter a list. Here is an example:
import re
# list of languages to filter
words = ["python", "scala", "java", "go", "javascript", "fortran", "philimon", "pyran"]
# pattern to filter for words starting with 'p' and ending with 'n'
pattern = r'^p.*n$'
# filter the list using list comprehension and regular expressions
filtered_words = [word for word in words if re.match(pattern, word)]
# print the filtered list
print(filtered_words)
In this example, we have a list of words stored in the words
variable. We want to filter the list and keep only the words that start with the letter 'p'
and end with the letter 'n'
. The regular expression pattern '^p.*n$
‘ is used to match such words.
Using list comprehension, we iterate over each word in the words list and check if it matches the specified pattern using re.match()
. If a word matches the pattern, it is included in the filtered_words
list.
Output:
#Output
['python', 'philimon', 'pyran']
Replacing Elements in a List using Regex re.sub() method
The re.sub()
method can be so useful in scenarios where you want to replace elements in a list. Let us look at an example:
import re
# list of strings to modify
strings = ["Python123", "PySpark456", "Scala789", "MongoDB123"]
# pattern to match and replace
pattern = r'\d+' # Matches one or more digits
# replacement string
replacement = "NUM"
# iterate over the list and replace matching elements using re.sub()
modified_strings = [re.sub(pattern, replacement, string) for string in strings]
# print the modified list
print(modified_strings)
In the code snippet, we have a list of strings stored in the strings
variable that we want to modify. Each string may contain digits that we want to replace.
The regular expression pattern '\d+'
is defined to match one or more consecutive digits. This pattern will be used to identify the digits in each string.
The replacement string is defined as "NUM"
. This string will be used to replace the matched digits.
Using list comprehension, we iterate over each string in the strings list. For each string, the re.sub()
method is applied, which takes the pattern, replacement, and the string itself. This method replaces all occurrences of the pattern (digits) with the replacement string in each string.
Finally, the modified strings are stored in the modified_strings
list
Output:
#Output
['PythonNUM', 'PySparkNUM', 'ScalaNUM', 'MongoDBNUM']
Conclusion
That concludes this comprehensive tutorial on Python regex list. Searching for patterns in a list is made easy with the power of regex in Python. Whether you need to find the first occurrence or collect all matches, re.search()
and re.findall()
are valuable tools. Experiment with different patterns and explore the capabilities of regex to enhance your list-searching capabilities.