Find a Word in a String Using Regex and Python

The re package is a core module for regular expressions in Python. This article will cover ways of searching for a word in a string using this package.

In particular, we will discuss how to search for a whole word in a string, a word with a given substring, words starting with or ending with x, etc. However, before we do that, let’s get some basics out of the way.

Some Special Characters in re

Some characters in re are special. These characters affect how patterns are interpreted. Here are a few examples we will use for the most part.

Character	Meaning	Example
*	Matches 0 or more repetitions of the preceding character(s) or expression	“ab*” matches “ab”, “abb”, or “a” followed by any number of “b”s.
+	Matches 1 or more characters of the preceding character or regex	“ab+” matches “abb” or “ab” followed by any number of “b”s. DOES NOT MATCH “ab”
\w (the opposite is \W)	Matches any alpha-numeric character and the underscore.
\b	Matches whitespace. Formally, \b is defined as the boundary between a \w and a \W character (or vice versa).
\d (\D does the opposite)	Matches a decimal digit, 0-9

Note: Finding a word in regex is majorly pegged on the fact that white spaces separate words. Therefore, we will mainly use the “\b” character as the beginning and the end of a word.

It’s time to see some examples now.

Example 1: Finding a Whole Word in a Python String

We will use the r”\b<word>\b” pattern to find a whole word in a string. As mentioned earlier, a word is separated by white spaces (\b).

The “r” preceding the pattern renders anything inside the quotes as a raw string (no character is given special meaning). We use the escape character, “\”, to override that and introduce special characters, e.g., “\b”.

import re

# The string we want to search and the pattern to look for

str1 = "This is a bigger group. There is not a small group"

pattern = r"\bgroup\b"

# Using re.search() method to find the patter "\bgroup\b"

# which matches the word "group"

result = re.search(pattern,str1)

# Print the result

print(result)

# Start and end indices

print("Starts at: ", result.start())

print("Ends at: ", result.end())

# Span of the word - a tuple of start and end.

print("Spans: ", result.span())

Output:

<re.Match object; span=(17, 22), match='group'>
Starts at:  17
Ends at:  22
Spans:  (17, 22)

Notice that re.search() matches only the first instance of the word. If you want to match all the occurrences of the word, use the re.finditer(). It generates an iterator that yields all instances matching the pattern provided; for example,

import re

# The string we want to search

str1 = "This is a bigger group. There is not a small group"

pattern = r"\bgroup\b"

result_iter = re.finditer(pattern, str1)

for result in result_iter:

print(result)

Output:

<re.Match object; span=(17, 22), match='group'>
<re.Match object; span=(45, 50), match='group'>

Example 2: Search for Multiple Words

We can search for multiple words in a string using the OR operator (“|”) in re. For example,

import re

# Match more than one word - matching small, big, and group

results = re.finditer(r"\bsmall\b|\bis\b|\bgroup\b", "This is a bigger group. There is not a small group")

# Looping through all instances using list comprehension

results = [i for i in results]

print(results)

Output:

[<re.Match object; span=(10, 13), match='big'>, <re.Match object; span=(17, 22), match='group'>, <re.Match object; span=(39, 44), match='small'>, <re.Match object; span=(45, 50), match='group'>]

We can also use the re.findall() function to find all matches as a list of strings.

import re

results = re.findall(r"\bsmall\b|\bis\b|\bgroup\b", "This is a bigger group. There is not a small group")

print(results)

Output:

['is', 'group', 'is', 'small', 'group']

Example 3: Finding words that start or end with

Here are some examples. The code contains comments to help you understand.

import re

str1 = "It always bothers me that, accor56ding to the a laws as we understand them today"

# Find words that start with "a"

results1 = re.findall(r"\ba\w*\b", str1)

print(results1)

# Matches words starting with "a" but must be followed with at least

# one alpha-numeric. The letter "a" is not a match.

results2 = re.findall(r"\ba\w+\b", str1)

print(results2)

# Matches any word that ends with "ws"

results3 = re.findall(r"\w*ws\b", str1)

print(results3)

# Words starting with a but ignore words with digits, e.g. "accor56ding"

#[^\d\W] matches all character except numbers (\d) and non-alphanumeric (\W).

results4 = re.findall(r"\ba[^\d\W]+\b", str1)

print(results4)

Output:

['always', 'accor56ding', 'a', 'as']
['always', 'accor56ding', 'as']
['laws']
['always', 'as']

The first three cases in the above snippet accept alpha-numeric (\w), but the last one does not. The latter picks all words without numbers.

Note also the difference between \w* and \w+. The former matches 0 or more characters, but the latter matches at least one alpha-numeric character.

We can even do more. Find words that start with and/or end with. An example is given below.

import re

str1 = "It always bothers me that, accor56ding to the a laws as we understand them today"

# Starts with bo or ends with e. Note the use of or ("|") operator.

results3 = re.findall(r"\bbo\w*|\w*e\b", str1)

print(results3)

# Starts with "un" and ends with "d".

results5 = re.findall(r"\bun\w*d\b", str1)

print(results5)

Output:

['bothers', 'me', 'the', 'we']
['understand']

Example 4: Find a word that has a given substring or character

import re

# Starts with or ends with

str1 = "It always bothers me that, acco56rding to the a laws as we understand them today"

# \w* captures 0 or more alpha-numeric before "a".

# \b\w*a\w*\b captures any word with the letter "a", including "a" itself.

results = re.findall(r"\b\w*a\w*\b", str1)

print(results)

# \w+ means at least one alpha-numeric. That means the patterns in the following

# line matches a word starting with s but must be followed by an alpha-numeric

# before the white space

results = re.findall(r"\b\w*s\w+\b", str1)

print(results)

# \d captures a numerical digit. Therefore this pattern will capture a word with a digit

# in it, and the digit must be followed by alpha-numeric (\w+) before the white space (\b).

# s must be followed by a letter before the white space.

results = re.findall(r"\b\w*\d\w+\b", str1)

print(results)

Output:

['always', 'that', 'acco56rding', 'a', 'laws', 'as', 'understand', 'today']
['understand']
['acco56rding']

Example 5: Dealing with Capitalization in Python re

The re module is case-sensitive by default. You can change that by issuing re.IGNORECASE flag. Here is an example.

import re

str1 = "Group members are committed to the group"

# Finding the word group with re in a default way.

# Find "group" only and not "Group"

result1 = re.findall(r"\bgroup\b", str1)

print(result1)

# The re search is turned case insensitive

# Finds both "Group" and "group"

result2 = re.findall(r"\bgroup\b", str1, re.IGNORECASE)

print(result2)

Output:

['group']
['Group', 'group']

Conclusion

The special character \b is mainly used to define the boundaries of words. Therefore, you may find it helpful in most cases when you are searching for a word in a Python string using regex. In this article, we have discussed five examples of looking for words using re.

You can do more practice on how to write regular expressions using these sites: https://regexr.com/ or https://regex101.com/.

Codeigo

Just programming

Find a Word in a String Using Regex and Python

Some Special Characters in re

Example 1: Finding a Whole Word in a Python String

Example 2: Search for Multiple Words

Example 3: Finding words that start or end with

Example 4: Find a word that has a given substring or character

Example 5: Dealing with Capitalization in Python re

Conclusion