Regular expressions (regex) in Python provide a powerful way to manipulate and transform text. One useful feature is the ability to use capture groups in replacement patterns while performing string replacements. This tutorial will guide you through the process of using capture groups in regex replacements in Python.

Advertisements

Here is what we will cover:

  • Understanding Capture Groups
  • The syntax for Capture Groups
  • Using Capture Groups in re.sub()
  • Accessing Capture Groups in the Replacement Pattern
  • Replacing Date Formats using Capture Groups
  • Conclusion

Understanding Capture Groups

Before anything else, let us get to understand what capture groups are and their usage in Python regex. These are just portions of a regular expression pattern enclosed within parentheses (). They allow you to isolate and extract specific parts of the matched text. These captured groups can be referenced and utilized during string replacements.

The syntax for Capture Groups

The best way to make the most of these capture groups is to understand their syntax. To create a capture group you use parentheses () as below:


(pattern)

Using Capture Groups in re.sub()

Capture groups play a crucial role in using re.sub() for pattern replacement. They allow us to capture and refer to specific portions of the matched text within the replacement pattern. Here’s an example to demonstrate using capture groups in re.sub():


import re

# the original text
text = "Hello, John Doe!"

# define the regex pattern with a capture group for the name
pattern = r"Hello, (\w+) (\w+)!"

# specify the replacement pattern with a backreference to the captured name
replacement = r"Welcome, \2, \1!"

# perform the replacement using re.sub()
new_text = re.sub(pattern, replacement, text)

# print the modified text
print(new_text)

In the example, the capture group (\w+) captures the first and last names separately. The replacement pattern \2, \1 refers to the second and first captured groups, resulting in the desired name reversal.

Output:


#Output
Welcome, Doe, John!

Accessing Capture Groups in the Replacement Pattern

As you are working with capture groups you may want to reuse them at some point. So to access them from the pattern in the replacement pattern string, simply use backreferences.

Here’s an example that demonstrates how to access capture groups in the replacement pattern:


import re

# the original text
text = "On 2012-12-12, SparkByExamples was launched"

# define the regex pattern with a capture group for the date
pattern = r"(\d{4})-(\d{2})-(\d{2})"

# specify the replacement pattern with access to the capture groups
replacement = r"Year: \1, Month: \2, Day: \3"

# perform the replacement using re.sub()
new_text = re.sub(pattern, replacement, text)

# print the modified text
print(new_text)

Here, the pattern (\d{4})-(\d{2})-(\d{2}) matches a date in the format YYYY-MM-DD. It consists of three capture groups representing the year, month, and day respectively.

The replacement pattern Year: \1, Month: \2, Day: \3 uses backreferences (<strong>\1</strong>, <strong>\2</strong>, <strong>\3</strong>) to access the captured groups. This allows us to retrieve the matched date components and insert them into the replacement string, separated by commas and labeled with their respective names.

The re.sub() method performs the replacement by finding matches of the pattern in the original text and replacing them with the specified replacement pattern. The resulting modified text is stored in the new_text variable.

Output:


#Output
On Year: 2012, Month: 12, Day: 12, SparkByExamples was launched

Replacing Date Formats using Capture Groups

In certain scenarios, you may need to replace date formats in a string with a different format. This can be accomplished using regular expressions and capture groups. Capture groups allow you to extract specific portions of the matched text and reference them in the replacement pattern. Here’s an example that demonstrates how to replace date formats using capture groups:


import re

# the original text
text = "Dates: 2022-01-15, 2023-05-20, 2024-09-10"

# define the regex pattern for the date format YYYY-MM-DD
pattern = r"(\d{4})-(\d{2})-(\d{2})"

# specify the replacement pattern for the desired format DD/MM/YYYY
replacement = r"\3/\2/\1"

# perform the replacement using re.sub()
new_text = re.sub(pattern, replacement, text)

# print the modified text
print(new_text)

In this code snippet, we want to replace date formats in the form of YYYY-MM-DD with the format DD/MM/YYYY. To achieve this, we define a regular expression pattern with three capture groups: (\d{4}) for the year, (\d{2}) for the month, and (\d{2}) for the day.

The replacement pattern \3/\2/\1 references the captured groups in the desired order to construct the new date format. The \3 corresponds to the third captured group (day), \2 corresponds to the second captured group (month), and \1 corresponds to the first captured group (year).

By using re.sub() and providing the pattern, replacement, and original text, the function performs the replacement, replacing the matched date formats with the new format.

Output:


#Output
Dates: 15/01/2022, 20/05/2023, 10/09/2024

Conclusion

That concludes our tutorial. We hope there is so much knowledge that you have gained and that you will experiment with every example in this tutorial to solidify your understanding.