Find Common Words in Two Strings Using Python

This article will cover two methods of finding common words between two Python strings:

  • Method 1: Using for-loop and list comprehension,
  • Method 2: Using the set intersection

We will also discuss how to deal with capitalization and punctuation marks when finding these words.

Let’s define the strings we will be using in our examples beforehand.

Method 1: Using for-loop and list comprehension

This method accomplishes the task in two steps: split the given strings into lists of words, and loop through one of the lists, checking if a given word exists in the other list.

Here is an example.

Output:

['This', 'is', 'string']
['spoken', 'articles']

The for-loop in the code snippet above can be reduced into a one-liner list comprehension as follows

Output:

['string', 'This', 'is']
['spoken', 'articles']

Method 2: Using the set intersection

The intersection of two sets, A and B, denoted by A B, is the set containing all elements of A that also belong to B, and vice-versa.

Python supports the concept of set intersection natively. For example,

Output:

{2, 3}

Similarly, we can use the concept of intersection to find common words between two strings.

First, we need to split the strings into lists of words, convert the lists into sets then find their intersection.

Output:

['is', 'string', 'This']
['spoken', 'articles']

The idea of the set intersection to find common words in Python strings can also be implemented in NumPy. Here is an example.

Output:

['This' 'is' 'string']
['articles' 'spoken']

Dealing with Capitalization and Punctuation marks

So far, we have not discussed how to handle capitalization and punctuation marks in our strings. For example, in the functions above, “Spoken”, “SpoKen”, and “spoken” will be treated as different words. And so are “business;” and “business,”.

For example, the following code returns no common words because of punctuation. Ideally, we might have expected to have “Spoken” and “articles” to be captured as common words:

We will add two more arguments to the FindCommonWords () function to control the search based on punctuation.

Output:

[]
['spoken']
['articles', 'spoken']

Conclusion

This article discussed two methods of finding common words in two Python strings using for-loop and set intersection. We also discussed how to handle strings with capitalization and punctuation marks.