Exclude a Specific String From From Regex Using Python

The re module is a package at the center of regular expressions in Python. In most cases, we are interested in matching patterns. However, in other cases, we want to exclude some strings based on a given pattern.

This article covers different methods used to exclude substring, giving examples in each case.

Using re Special Characters

Two special characters come in handy in this case: [ ] and ^. The [ ] character is used to specify a set of characters we wish to match.

For example, “[abc]” is used to match characters “a”, “b” and “c”, and “[0-5][0-9]” matches number strings between 00 and 59.

If the first character in the set, [ ], is “^”, then all the characters in the set are excluded. For example, “[^xyz]” matches all characters except “x”, “y” and “z”. Here are more examples,

Regular expression Meaning
“[^a-z]” Matches all characters except any lowercase ASCII alphabetical letter
“[^.]” Matches any character except a period
“[^\w]” Matches any character except word characters
“[a^z]” Matches “a”, “^”, and “z”.
Table 1: Some regular expressions showing how “[ ]” and “^” characters can be used to exclude substrings.

Note: “[^^]” will match any character except “^”. ^ has no special meaning if it’s not the first character in the set, [ ] (see the last row of Table 1).

Let’s now see some of the examples in the code.

Example 1: Excluding all integers

The character “\d+” matches any number in the string passed. If we want to exclude numbers, we need to use the pattern “[^\d+]”.

Output:

['Look like ', 'readable E', 'nglish. Many de', 'sktop']
Look like readable English. Many desktop

All the numbers have now been excluded.

Example 2: Exclude non-integer numbers (numbers with decimals)

What if we have real numbers with decimals in our string? The code in Example 1 will fail because “\d” captures characters in the set [0-9] only. To capture numbers non-integer numbers, we can modify the pattern as follows.

Output:

['maki', 'ng it over ', ' years old', ' Richa', 'rd McCl', 'intock']
making it over  years old Richard McClintock

The code snippet above successfully excluded all numbers in the string: 2.9, 2000, 6.1, and 8.

Exclude by substitution using re.sub()

The re.sub(pattern, replace, string) method is used to find a substring matching the given pattern in the string and replace it.

Output:

All the  as necessary. It uses a dictionary of over 200 words, generate .
All the Lorem Ipsum as necessary. It uses a dictionary of over 999 words, generate Lorem Ipsum.

Exclude by Searching and Skipping

In this section, we want to search for a substring based on a pattern and do one thing if it is found and another if it is not.

In the example below, we want to search and skip/exclude strings with the word “test”.

Output:

test found, <re.Match object; span=(35, 39), match='test'>
test not found. returned None
test not found. returned None
test found, <re.Match object; span=(23, 27), match='test'>
test found, <re.Match object; span=(33, 37), match='test'>
test not found. returned None

Conclusion

This article discussed methods of excluding some substrings based on patterns given to re. You can study more about re in the documentation and practice writing regular expressions at https://regexr.com/ or https://regex101.com/.

By doing so, you get to understand more about how to write patterns that fit your used case.