In Python, the regex findall (re.findall() function) is used to search a string using a regular expression pattern and return all non-overlapping matches as a list of strings.
Python comes with a very powerful built-in module called re
module. This module helps you to do tasks like finding matches in text data and manipulating that text data. The good thing about this module is that it comes with a pool of methods for different tasks. One of these methods is the re.findall()
method. The re.findall()
method is one of the popular methods that help you to extract patterns like digits, email addresses, and URLs from text.
In this tutorial, I will show you how powerful the re.findall()
method is, and how you can effectively use it to extract patterns from strings with the help of clear-cut examples.
1. Syntax of the re.findall() method
Following is the syntax of the re.findall()
method. This will help you know how to use the method more effectively with its arguments.
# Syntax of findall()
re.findall(pattern, string, flags=0)
The below table explains each argument:
Arguments | Description |
pattern | This is the regular expression to search for |
string | The input string to search within |
flags(optional) | This helps control the behavior of the regular expression |
re.findall()
method2. Usage of the re.findall() method
Let us look at some of the real-world examples in which the Python regex re.findall()
method can be used.
2.1 Finding digits in a string
What if you are working with text data and you wanted to find digits from the data, Python regex re.findall(
) method would come in handy here. Our very first example will demonstrate how to find digits in a given string using the re.findall()
method:
# Import re module
import re
# The string to search within
text = "SparkByExamples was founded in 2018 and launched in 2020"
# Find all digits in the text using the regular expression pattern '\d'
# '\d' matches any single digit
digits = re.findall(r'\d', text)
# Print the list of digits found
print(digits)
Here, a string variable text is defined, which contains the text to search within. In this case, the text is SparkByExamples was founded in 2018 and launched in 2020
.
The re.findall()
method is used to find all occurrences of a pattern in the text string. The pattern \d
is a regular expression pattern that matches any single digit.
The result of re.findall()
is stored in the digits variable, which will be a list containing all the digits found in the text string.
Output:
# Output
['2', '0', '1', '8', '2', '0', '2', '0']
2.2 Extracting email addresses in a string
There might situations where you want to extract only email addresses from a given string. The code for that would look as follows:
# Import re module
import re
# The input string to search within
text = "Contact SparkByExamples at [email protected] or [email protected] for further information."
# Find all email addresses in the text using the regular expression pattern
# the pattern matches valid email addresses
emails = re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b', text)
# Print the list of email addresses found
print(emails)
A string variable text is defined, which contains the text to search within. The re.findall()
method is used to find all occurrences of a pattern in the text string. The pattern r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b'
is a regular expression pattern that matches valid email addresses.
It looks for sequences of characters that start with a word boundary (\b)
, followed by a combination of alphanumeric characters, dots, underscores, percentage signs, plus signs, or hyphens ([A-Za-z0-9._%+-]+)
, followed by the @
symbol, then another sequence of alphanumeric characters, dots, or hyphens ([A-Za-z0-9.-]+)
, and finally, a period followed by at least two letters (\.[A-Za-z]{2,})
. The final word boundary (\b)
ensures that the match ends at a word boundary.
Output:
# Output
['sparkbyexa[email protected]', '[email protected]']
2.3 Extracting URLs
Another scenario where the re.findall()
method could come in handy is when you want to extract URLs from a given string. Here is an example:
# Import re module
import re
# Input string to search within
text = "Visit our website at https://www.sparkbyexamples.com for more coding tutorials."
# Find all URLs in the text using the regular expression pattern
# the pattern matches valid URLs starting with http:// or https://
urls = re.findall(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', text)
# Print the list of URLs found
print(urls)
In the code snippet, the re.findall()
method is used to find all occurrences of a pattern in the text string. The pattern r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'
is a regular expression pattern that matches valid URLs. It looks for sequences of characters that start with either "http://"
or "https://"
, followed by various allowed characters including letters, digits, symbols, and escape sequences. The +
indicates that the pattern can occur one or more times consecutively.
Finally, the result of re.findall()
is stored in the urls
variable, which will be a list containing all the URLs found in the text string.
Output:
# Output
['https://www.sparkbyexamples.com']
3. Conclusion
That concludes this tutorial. This tutorial has walked you through the syntax of the Python re.findall()
(regex findall()) method and its practical applications in some real-world examples. We hope that the knowledge gained in this tutorial shall be useful and applicable in your future Python projects. Thanks for reading.