Greedy and Non-greedy Regex in Python

Regex matching in Python is done in two ways: greedy and non-greedy (also called lazy matching).

The Difference Between Greedy and Non-greedy Matching

Greedy matching means the regex engine tries to match as much as possible while still obeying the rules on the overall pattern. On the other hand, non-greedy matching (also called lazy matching) entails matching as little as possible. Note that regex matching is, by default, greedy in Python.

In Python, you can specify greedy and non-greedy matching using the “?” character. The “?” character is used after the quantifier (discussing this shortly), which determines how many times the previous character or group of characters should be matched.

An example of greedy and non-greedy regex matching

Suppose we have a string “fabcdaxyzapq”; we can match a substring starting and ending with “a” in a greedy manner using the pattern “a.*a” – where “.*” matches zero or more occurrences of any character. Here is the Python code.

Output:

abcdaxyza

We can get the shortest substring starting with “a” and ending with “a” using non-greedy matching, as shown below.

Output:

abcda

Common Regex Quantifiers in Python

The following table contains some common regex quantifiers. As said earlier, these quantifiers are greedy by default. You can add “?” after the quantifier to make them non-greedy.

Quantifier Description
a* Matches zero or more occurrences of “a”.
a+ Matches one or more occurrences of “a”.
a? Matches zero or one occurrence of “a”.
a{m} Matches m occurrences of “a”.
a{m,n} Matches m to n (inclusive) occurrences of “a”.

More Examples of Greedy and Non-greedy Matching

Example 1

Output:

['bb']
['b', 'b']

The pattern “b+” matches one or more occurrences of b+. In the example above, “b+” will match the two “b” letters in “aaaabbccd”.

The expression “b+?”, on the other hand, is the non-greedy version of “b+”, which means it will match one or more occurrences of “b”, but it will try to get the smallest possible sequence of “b” characters. Therefore, the pattern will match individual “b” characters.

Example 2

Output:

['This', 'is', 'a', 'test', 'string']
['T', 'h', 'i', 's', 'i', 's', 'a', 't', 'e', 's', 't', 's', 't', 'r', 'i', 'n', 'g']

In the example above, “\w+” matches one or more occurrences of any alphanumeric character or underscore.
Greedy matching matches as many characters as possible to form a list of complete words (matching stops when it hits white space, which is not alphanumeric).

On the other hand, non-greedy matching gets the fewest number of characters based on the pattern. For that reason, ‘\w+?’ matches individual alphanumeric characters.

Conclusion

This article discusses two forms of regex matching in Python- greedy and non-greedy matching. The former is implemented by default, but the latter can be implemented explicitly by adding a “?” character after the regex quantifier. After going through the examples in this guide, you should be able to implement greedy and non-greedy matching easily.