Recursive Regex in Python

A recursive regular expression is a feature that allows the entire regex pattern or part of it to be re-executed on the remaining unmatched text. The syntax (?R) or (?0) defines recursive regex in Python. Don’t worry if that definition seems intimidating. It will get clear when we work on examples.

Whenever the search engine encounters (?R), it starts the whole pattern-matching process again at the current position of the string. If you want only to repeat a specific part of the pattern, you can use the grouping index, such as (?1), (?2), etc.

Let’s now work on some examples to cement the understanding of recursive regex in Python. We will use the regex library, which can be installed using pip with the command.

pip install regex

Example 1: Simple pattern matching using recursive regex

Let’s break down the first example. “a(?R)?z” means look for one or more “a” characters followed by zero or more occurrences of (?R) which is a recursive regex, followed by one or more “z” characters.

That means “a(?R)?z” matches the first “a” in the string. Then, it will reach the (?R) and attempt the whole pattern again at the same position, matching the second “a”. This process will repeat until the search reaches the final “z” in the string.

Example 2: Using recursive regex to find palindromes in Python

A palindrome is a string that reads the same forward and backward, for example, racecar, mom, rotator, and level. Here is an example code of using recursive regex to check whether a string is a palindrome.

Output:

racecar is a palindrome? True
python is a palindrome? False
peep is a palindrome? True
rotator is a palindrome? True
debian is a palindrome? False
wallet is a palindrome? False

Let’s break down the pattern r”(\w)((?R)|(\w?))\1″ with the input string “racecar”.

  • (\w) – matches a single alphabetical character, for example, r.
  • (\w)\1 – matches two identical alphabetical characters, e.g. “rr” or “cc”,
  • (\w)(\w?)\1 – matches two or three alphabetical characters where the first and the last characters are the same. For example, “rar” or “rr”. The idea of checking palindrome using the pattern in the code above is beginning to show, right?
  • (\w)(((\w)\4)|(\w?))\1 – this matches 3 or 4 characters with the first and last characters match. If we have 4 characters, the two characters in between must be the same. That is to say, a palindrome of 3 or 4 characters, e.g., “raar” or “rar”.
  • We can then repeat the same process using (?R) recursive regex with the pattern r”(\w)((?R)|(\w?))\1″.

Example 3: Matching opening and closing brackets in a string

This example checks all opening brackets in a string are closed. Let’s see how to do that in code.

Output:

['(string)', '(brackets)']

Here are more examples of the same if you want to check strings with punctuation only.

Conclusion

This guide explains using recursive regex in Python to re-execute pattern search. This article covered how recursive regular expression works. You can practice more about regex on these sites: https://regexr.com/ and https://regex101.com/.