Find a Word in a String Using Regex and Python

The re package is a core module for regular expressions in Python. This article will cover ways of searching for a word in a string using this package.

In particular, we will discuss how to search for a whole word in a string, a word with a given substring, words starting with or ending with x, etc. However, before we do that, let’s get some basics out of the way.

Some Special Characters in re

Some characters in re are special. These characters affect how patterns are interpreted. Here are a few examples we will use for the most part.

Character Meaning Example
* Matches 0 or more repetitions of the preceding character(s) or expression “ab*” matches “ab”, “abb”, or “a” followed by any number of “b”s.
+ Matches 1 or more characters of the preceding character or regex “ab+” matches “abb” or “ab” followed by any number of “b”s. DOES NOT MATCH “ab”
\w

(the opposite is \W)

Matches any alpha-numeric character and the underscore.
\b Matches whitespace. Formally, \b is defined as the boundary between a \w and a \W character (or vice versa). 
\d

(\D does the opposite)

Matches a decimal digit, 0-9 
Note: Finding a word in regex is majorly pegged on the fact that white spaces separate words. Therefore, we will mainly use the “\b” character as the beginning and the end of a word.

It’s time to see some examples now.

Example 1: Finding a Whole Word in a Python String

We will use the r”\b<word>\b” pattern to find a whole word in a string. As mentioned earlier, a word is separated by white spaces (\b).

The “r” preceding the pattern renders anything inside the quotes as a raw string (no character is given special meaning). We use the escape character, “\”, to override that and introduce special characters, e.g., “\b”.

Output:

<re.Match object; span=(17, 22), match='group'>
Starts at:  17
Ends at:  22
Spans:  (17, 22)

Notice that re.search() matches only the first instance of the word. If you want to match all the occurrences of the word, use the re.finditer(). It generates an iterator that yields all instances matching the pattern provided; for example,

Output:

<re.Match object; span=(17, 22), match='group'>
<re.Match object; span=(45, 50), match='group'>

Example 2: Search for Multiple Words

We can search for multiple words in a string using the OR operator (“|”) in re. For example,

Output:

[<re.Match object; span=(10, 13), match='big'>, <re.Match object; span=(17, 22), match='group'>, <re.Match object; span=(39, 44), match='small'>, <re.Match object; span=(45, 50), match='group'>]

We can also use the re.findall() function to find all matches as a list of strings.

Output:

['is', 'group', 'is', 'small', 'group']

Example 3: Finding words that start or end with

Here are some examples. The code contains comments to help you understand.

Output:

['always', 'accor56ding', 'a', 'as']
['always', 'accor56ding', 'as']
['laws']
['always', 'as']

The first three cases in the above snippet accept alpha-numeric (\w), but the last one does not. The latter picks all words without numbers.

Note also the difference between \w* and \w+. The former matches 0 or more characters, but the latter matches at least one alpha-numeric character.

We can even do more. Find words that start with and/or end with. An example is given below.

Output:

['bothers', 'me', 'the', 'we']
['understand']

Example 4: Find a word that has a given substring or character

Output:

['always', 'that', 'acco56rding', 'a', 'laws', 'as', 'understand', 'today']
['understand']
['acco56rding']

Example 5: Dealing with Capitalization in Python re

The re module is case-sensitive by default. You can change that by issuing re.IGNORECASE flag. Here is an example.

Output:

['group']
['Group', 'group']

Conclusion

The special character \b is mainly used to define the boundaries of words. Therefore, you may find it helpful in most cases when you are searching for a word in a Python string using regex. In this article, we have discussed five examples of looking for words using re.

You can do more practice on how to write regular expressions using these sites: https://regexr.com/ or https://regex101.com/.