Python regex groups

Python regex groups offer powerful capabilities for extracting specific information from text and applying advanced pattern-matching techniques. With their help, you can perform complex text processing tasks and efficiently manipulate textual data. By grouping parts of a regular expression pattern together using parentheses, you can apply quantifiers, modifiers, and other operations to a specific set of characters as a single unit. This allows for logical grouping and control over the precedence of pattern matching. Additionally, regex groups enable you to capture and access the matched content for further processing or extraction.

Grouping Syntax

Before you dive deeper into working with regex groups in your Python code, the first thing that you have to do is to understand the syntax. The syntax for creating groups in Python regex is simple and intuitive, thereby allowing you to define logical units within your patterns for more precise pattern matching and extraction of specific information from text. So to create a regex group simply enclose the desired pattern within parentheses ( ). For all this to make sense, let us look at the following pattern (\d{3}) (\d{4}) which consists of two groups enclosed in parentheses. The first group (\d{3}) captures three consecutive digits, while the second group (\d{4}) captures four consecutive digits.

Accessing Captured Groups

To access captured groups on a match object returned by re.search() or re.match() methods, you can simply use the group() method. The group() method accepts an optional argument specifying the group number or group name.

To access captured groups by their numerical index, you can pass the group number to the group() method. The first captured group is accessed using group(1), the second group using group(2), and so on. Let us look at the following example:


import re

# the text to search within
text = "Hello, this is SparkByExamples."

# the pattern to match
pattern = r"(\w+), (.*?) is (\w+)."

# search for a match using the pattern
match = re.search(pattern, text)

# check if a match is found
if match:
    # print the first captured group
    print("Group 1:", match.group(1))
    
    # print the second captured group
    print("Group 2:", match.group(2))
    
    # print the third captured group
    print("Group 3:", match.group(3))

In this code, the text variable holds the string to search within, and the pattern variable defines the pattern to match. We use re.search() to find the first occurrence of the pattern in the text.

If a match is found, we access the captured groups using the match.group() method. Group 1 corresponds to the first set of parentheses, Group 2 corresponds to the second set of parentheses, and Group 3 corresponds to the third set of parentheses in the pattern.

Output:


#Output
Group 1: Hello
Group 2: this
Group 3: SparkByExamples

Named Groups

Named groups in Python regex allow us to assign names to specific capturing groups within a pattern. Instead of accessing groups by their numerical index, we can refer to them by their assigned names. This provides more clarity and makes the code more readable.

To define a named group, we use the syntax (?P<name>pattern), where name is the name assigned to the group and pattern is the regular expression pattern. Here’s an example:


import re

# the input text containing names
text = "Dennis Ritchie, Brian Yu, Bill Gates, Stevie Jobs"

# the pattern to match the first and last names
pattern = r"(?P\w+) (?P\w+)"

# find all matches in the text and store them in 'matches'
matches = re.findall(pattern, text)

# iterate over each match
for match in matches:
    # unpack the first name and last name from the match
    first_name, last_name = match

    # print the first name and last name
    print("First name:", first_name)
    print("Last name:", last_name)

The input text contains a list of names separated by commas. The pattern r"(?P<first_name>\w+) (?P<last_name>\w+)" is used to match each name and capture the first name and last name as named groups.

The re.findall() method is used to find all matches of the pattern in the text. The matches are returned as a list of tuples, with each tuple containing the captured first name and last name.

The code then iterates over each match and unpacks the first name and last name from each tuple.

Output:


#Output
First name: Dennis
Last name: Ritchie
First name: Brian
Last name: Yu
First name: Bill
Last name: Gates
First name: Stevie
Last name: Jobs

Nested Groups

Nested groups in Python regular expressions allow you to define groups within groups, creating a hierarchical structure for pattern matching and capturing. This can be useful when dealing with complex patterns and needing to extract specific sub-patterns within larger patterns.

To create nested groups, you simply enclose the inner group within parentheses inside the outer group. This allows you to capture the nested groups as separate entities.

Here’s an example code snippet that demonstrates nested groups:


import re

# defining the string 
text = "Hello, my name is Guido Van Rossum. I work as a software engineer at SparkByExamples."

# defining the pattern
pattern = r"(Hello, (my name is (.*)))"

# search for the pattern in the text
match = re.search(pattern, text)

if match:
    # print the complete match
    print("Complete match:", match.group(0))
    
    # print the outer group
    print("Outer group:", match.group(1))
    
    # print the first inner group
    print("Inner group 1:", match.group(2))
    
    # print the second inner group
    print("Inner group 2:", match.group(3))

In this code, we define the text variable containing the input text. The pattern variable stores the regex pattern with nested groups.

Using re.search(), we search for the first occurrence of the pattern in the text. If a match is found, we access and print the captured groups using match.group().

match.group(0) retrieves the complete match.
match.group(1) retrieves the outer group.
match.group(2) retrieves the first inner group.
match.group(3) retrieves the second inner group.

Output:


#Output
Complete match: Hello, my name is Guido Van Rossum. I work as a software engineer at SparkByExamples.
Outer group: Hello, my name is Guido Van Rossum. I work as a software engineer at SparkByExamples.
Inner group 1: my name is Guido Van Rossum. I work as a software engineer at SparkByExamples.
Inner group 2: Guido Van Rossum. I work as a software engineer at SparkByExamples.

Conclusion

That concludes this tutorial, we hope there is so much that you have learned and that you will put it into practice.