Remove Characters From String Using Regex and Python

This article discusses removing the special characters from a Python string using the re module. We will cover all cases you might think of – remove punctuations, white spaces, numbers, and the first letter of every word in the string, among other cases.

For all examples, we will use the re.sub(pattern, repl, string) function to replace all matches for the pattern in the string with repl. In all cases, we will set repl=”” (empty string) so that replacements effectively remove the matches.

Let’s work on some examples.

Example 1: Removing the Last Character in the String

Output:

Chicag

The pattern: “.$”

Explanation: The “$” matches the last character in the string, str1, and “.” matches any character except a new line. That makes the “.$” match the last character of the string.

Example 2: Remove the First or the Last Word in the String

Output:

This is a test

The pattern: r”\b\w+$”

Explanation: As said in example one, “$” matches the end of a string. \w+ matches one or more word characters (alphanumeric characters plus underscore (_)), and \b matches the word boundary. That means r”\b\w+$” matches a word boundary followed by a word of any length at the end of the string.

Note: If you have string-terminating characters like the period at the end of the string, the pattern above will fail. For such a case, use r” s+S+$”.

Output:

This is a test

The pattern r”\s+\S+$” matches one or more whitespaces (\s+) followed by one or more non-whitespace characters (\S+) until the end of the string ($).

You can remove the first word in a string using the r”^\w+\s*” pattern, as shown below.

Output:
is a test string

The “^” matches the beginning of the string, \w+ matches one or more word characters, and \s* matches 0 or more whitespace characters. That means r”^\w+\s*” matches the first word and all whitespaces coming after it. If you want to remove the word only and not white spaces, remove “\s*”.

Example 3: Remove All or Specific Punctuation Marks

Output:

This is a test string

The pattern: r”[^\w\s]”

Explanation: The [ ] is used to indicate a set of characters, e.g., [abc] matches the characters “a”, “b”, and “c”. When the “^” character comes at the beginning of the set, it means the complement of the set, e.g., [^abc] matches all characters except “a”, “b”, and “c”.

That means r”[^\w\s]” matches any character that is not a word or white space character.

If you want to remove specific punctuation marks, specify them inside the set character. For example,

Output:

Th#is is a test strin&g*

Example 4: Remove a Character or Series of Characters

Output:

Ths s a test strng
Th  a test string
Ths s a es srng
Th  a test strg

Example 5: Remove Numbers

This example discusses how to remove signed and unsigned numbers plus decimal numbers.

Output:

This is a  test string
This is a  t+est strin-g

The “\d+” matches one or more consecutive digits in the string. As shown in the output, the pattern used in the example above only works for unsigned numbers – numbers without + or – signs, e.g., -95 and +4. Let’s fix that.

Output:

This is a  test string
This i.s a  test strin.g

As the example above shows, the pattern r”\+?-?\d+” now works for signed and unsigned numbers but fails to match decimals. Let us fix that as well.

Output:

This is a  test string

The conclusion for this example: The pattern r”\+?-?\d+(\.\d+)?” is best for capturing any number – signed, unsigned, and decimals.

Example 6: Remove the First or the Last x Characters

Output:

s a test string
This is a tes
This is a test s

Example 7: Remove All Whitespaces or Extra Spaces

Output:

Thisisateststring

Output:

This is a test string

Example 8: Remove Capital Letters

Output:

 is  est trin

Example 9: Remove the First or Last Character in Every Word

Output:

his s  est tring

Output:

Thi i  tes strin

Example 10: Remove Punctuation Marks and Numbers

Output:

This is a  test striNG
The pattern: r"[^a-zA-Z\s]+"

Explanation: The set [^a-zA-Z\s] matches any character except lower and capital alphabets and whitespaces.

Conclusion

This article discussed removing characters from a string using regular expressions in Python. The ten examples covered here present different cases of the problem. You can read more about Python regex in re documentation or practice writing regular expressions at https://regexr.com/ and https://regex101.com/.