A recursive regular expression is a feature that allows the entire regex pattern or part of it to be re-executed on the remaining unmatched text. The syntax (?R) or (?0) defines recursive regex in Python. Don’t worry if that definition seems intimidating. It will get clear when we work on examples.
Whenever the search engine encounters (?R), it starts the whole pattern-matching process again at the current position of the string. If you want only to repeat a specific part of the pattern, you can use the grouping index, such as (?1), (?2), etc.
Let’s now work on some examples to cement the understanding of recursive regex in Python. We will use the regex library, which can be installed using pip with the command.
pip install regex
Example 1: Simple pattern matching using recursive regex
1 2 3 4 5 6 7 8 9 10 11 12 |
import regex result1 = regex.findall("a(?R)?z", "aaazzzzaabazzzz") # ['aaazzz', 'az'] print(result1) result2 = regex.findall("ah(?R)?", "aahahahhh") # ['ahahah'] print(result2) result3 = regex.findall("a(?R)?z", "zzaa") # [] print(result3) result4 = regex.findall("ah(?R)?", "ahaahahahhh") # ['ah', 'ahahah'] print(result4) result5 = regex.findall("x(?R)?yz(?R)?", "xxyzxyzyz") # ['xxyzxyzyz'] print(result5) |
Let’s break down the first example. “a(?R)?z” means look for one or more “a” characters followed by zero or more occurrences of (?R) which is a recursive regex, followed by one or more “z” characters.
That means “a(?R)?z” matches the first “a” in the string. Then, it will reach the (?R) and attempt the whole pattern again at the same position, matching the second “a”. This process will repeat until the search reaches the final “z” in the string.
Example 2: Using recursive regex to find palindromes in Python
A palindrome is a string that reads the same forward and backward, for example, racecar, mom, rotator, and level. Here is an example code of using recursive regex to check whether a string is a palindrome.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
import regex def isPalindrome(string): pattern = r"(\w)((?R)|(\w?))\1" result = regex.search(pattern, string) # is part or whole of the string is a palindrome, the pattern will return that # part, else None. E.g, "apapz" results in "pap" if result is not None: # Next, we need to be sure we are matching the whole text. # "racecar" will match a whole word, but "apapz" will march part, "pap". if result.group() == string: return True return False words = ["racecar", "python", "peep", "rotator", "debian", "wallet"] for word in words: result = isPalindrome(word) print(word, "is a palindrome? ", result) |
Output:
racecar is a palindrome? True python is a palindrome? False peep is a palindrome? True rotator is a palindrome? True debian is a palindrome? False wallet is a palindrome? False
Let’s break down the pattern r”(\w)((?R)|(\w?))\1″ with the input string “racecar”.
- (\w) – matches a single alphabetical character, for example, r.
- (\w)\1 – matches two identical alphabetical characters, e.g. “rr” or “cc”,
- (\w)(\w?)\1 – matches two or three alphabetical characters where the first and the last characters are the same. For example, “rar” or “rr”. The idea of checking palindrome using the pattern in the code above is beginning to show, right?
- (\w)(((\w)\4)|(\w?))\1 – this matches 3 or 4 characters with the first and last characters match. If we have 4 characters, the two characters in between must be the same. That is to say, a palindrome of 3 or 4 characters, e.g., “raar” or “rar”.
- We can then repeat the same process using (?R) recursive regex with the pattern r”(\w)((?R)|(\w?))\1″.
Example 3: Matching opening and closing brackets in a string
This example checks all opening brackets in a string are closed. Let’s see how to do that in code.
1 2 3 4 5 6 7 8 9 10 |
import regex def check_brackets(string): pattern = r"\((?:[^()]|(?R))*\)" result = regex.findall(pattern, string) return result string = "This is a (string) with) (brackets)" result = check_brackets(string) print(result) |
Output:
['(string)', '(brackets)']
Here are more examples of the same if you want to check strings with punctuation only.
1 2 3 4 5 6 7 8 9 10 |
import regex a1 = regex.search(r"^(\[(?1)*\])(?1)*$", "[]") is not None print(a1) # True a2 = regex.search(r"^(\((?1)*\))(?1)*$", "()(()") is not None print(a2) # False a3 = regex.search(r"^(\{(?1)*\})(?1)*$", "{{{}}}") is not None print(a3) # True a4 = regex.search(r"^(\((?1)*\))(?1)*$", "(((())())") is not None print(a4) # False |
Conclusion
This guide explains using recursive regex in Python to re-execute pattern search. This article covered how recursive regular expression works. You can practice more about regex on these sites: https://regexr.com/ and https://regex101.com/.