Remove Characters From String Using Regex and Python

This article discusses removing the special characters from a Python string using the re module. We will cover all cases you might think of – remove punctuations, white spaces, numbers, and the first letter of every word in the string, among other cases.

For all examples, we will use the re.sub(pattern, repl, string) function to replace all matches for the pattern in the string with repl. In all cases, we will set repl=”” (empty string) so that replacements effectively remove the matches.

Let’s work on some examples.

Example 1: Removing the Last Character in the String

import re

string = "Chicago"

# Remove the last character from a string

result = re.sub(".$", "", string)

print(result)

Output:

Chicag

The pattern: “.$”

Explanation: The “$” matches the last character in the string, str1, and “.” matches any character except a new line. That makes the “.$” match the last character of the string.

Example 2: Remove the First or the Last Word in the String

import re

string = "This is a test string"

result = re.sub(r"\b\w+$", "", string)

print(result)

Output:

This is a test

The pattern: r”\b\w+$”

Explanation: As said in example one, “$” matches the end of a string. \w+ matches one or more word characters (alphanumeric characters plus underscore (_)), and \b matches the word boundary. That means r”\b\w+$” matches a word boundary followed by a word of any length at the end of the string.

Note: If you have string-terminating characters like the period at the end of the string, the pattern above will fail. For such a case, use r” s+S+$”.

import re

string = "This is a test string."

new_string = re.sub(r"\s+\S+$", "", string)

print(new_string)

Output:

This is a test

The pattern r”\s+\S+$” matches one or more whitespaces (\s+) followed by one or more non-whitespace characters (\S+) until the end of the string ($).

You can remove the first word in a string using the r”^\w+\s*” pattern, as shown below.

# Remove the first word

import re

string = "This is a test string"

result = re.sub(r"^\w+\s*", "", string)

print(result)

Output:
is a test string

The “^” matches the beginning of the string, \w+ matches one or more word characters, and \s* matches 0 or more whitespace characters. That means r”^\w+\s*” matches the first word and all whitespaces coming after it. If you want to remove the word only and not white spaces, remove “\s*”.

Example 3: Remove All or Specific Punctuation Marks

# Remove all punctuation marks

import re

string = "T@h#is $is a t%est %st^rin&g*"

result = re.sub(r"[^\w\s]", "", string)

print(result)

Output:

This is a test string

The pattern: r”[^\w\s]”

Explanation: The [ ] is used to indicate a set of characters, e.g., [abc] matches the characters “a”, “b”, and “c”. When the “^” character comes at the beginning of the set, it means the complement of the set, e.g., [^abc] matches all characters except “a”, “b”, and “c”.

That means r”[^\w\s]” matches any character that is not a word or white space character.

If you want to remove specific punctuation marks, specify them inside the set character. For example,

# Remove specific punctuation marks

import re

string = "T@h#is $is a t%est %st^rin&g*"

result = re.sub(r"[@$^%]", "", string)

print(result)

Output:

Th#is is a test strin&g*

Example 4: Remove a Character or Series of Characters

# Remove a character

import re

string = "This is a test string"

result = re.sub(r"i", "", string)

print(result)

# You can also replace a series of characters with a substring.

string = "This is a test string"

result = re.sub(r"is", "", string)

print(result)

# Replace multiple characters - "i" and "t", in this case.

string = "This is a test string"

result = re.sub(r"[it]", "", string)

print(result)

# Remove "is" or "in"

string = "This is a test string"

result = re.sub(r"is|in", "", string)

print(result)

Output:

Ths s a test strng
Th  a test string
Ths s a es srng
Th  a test strg

Example 5: Remove Numbers

This example discusses how to remove signed and unsigned numbers plus decimal numbers.

# Removing unsigned numbers

import re

string = "Thi44s is a 56 tes5t stri99ng"

string2 = "Thi44s is a 56 t+8es5t stri99n-7g"

result = re.sub(r"\d+", "", string)

# failed to pick signed numbers: +8 and -7

result2 = re.sub(r"\d+", "", string2)

print(result)

print(result2)

Output:

This is a  test string
This is a  t+est strin-g

The “\d+” matches one or more consecutive digits in the string. As shown in the output, the pattern used in the example above only works for unsigned numbers – numbers without + or – signs, e.g., -95 and +4. Let’s fix that.

import re

string = "Thi+44s is a 56 tes-5t stri-99ng"

string2 = "Thi+44s i-3.4s a 56 tes-5t stri-99n+5.8g"

# matches signed and unsigned

result = re.sub(r"\+?-?\d+", "", string)

# doesn't pick decimal numbers: -3.4 and +5.8

result2 = re.sub(r"\+?-?\d+", "", string2)

print(result)

print(result2)

Output:

This is a  test string
This i.s a  test strin.g

As the example above shows, the pattern r”\+?-?\d+” now works for signed and unsigned numbers but fails to match decimals. Let us fix that as well.

import re

string = "Thi+44s i-3.6s a 56 tes-5t s50.2tri-99ng"

result = re.sub(r"\+?-?\d+(\.\d+)?", "", string)

print(result)

Output:

This is a  test string

The conclusion for this example: The pattern r”\+?-?\d+(\.\d+)?” is best for capturing any number – signed, unsigned, and decimals.

Example 6: Remove the First or the Last x Characters

# Remove the first five characters

import re

string = "This is a test string"

result = re.sub(r"^.{6}", "", string)

print(result)

# Remove the last five characters

import re

string = "This is a test string"

result = re.sub(r".{8}$", "", string)

print(result)

Output:

s a test string
This is a tes
This is a test s

Example 7: Remove All Whitespaces or Extra Spaces

# Remove whitespaces

import re

string = "This is a test string"

result = re.sub(r"\s+", "", string)

print(result)

Output:

Thisisateststring

import re

# remove extra spaces

string = "This is a test string"

cleaned_string = re.sub(r"\s+", " ", string)

print(cleaned_string)

Output:

This is a test string

Example 8: Remove Capital Letters

# Remove caps

import re

string = "THIS is A Test StrinG"

no_caps = re.sub(r"[A-Z]", "", string)

print(no_caps)

Output:

 is  est trin

Example 9: Remove the First or Last Character in Every Word

import re

string = "This is a test string"

# remove the first letter of every word

result = re.sub(r"\b\w", "", string)

print(result)

Output:

his s  est tring

import re

s = "This is a test string"

# remove the last character in every word

result = re.sub(r"\w(?=\s|$)", "", s)

print(result)

Output:

Thi i  tes strin

Example 10: Remove Punctuation Marks and Numbers

string = "Thi+44s is %a 56 t%es-5t s*tri-99NG"

result = re.sub(r"[^a-zA-Z\s]+", "", string)

print(result)

Output:

This is a  test striNG
The pattern: r"[^a-zA-Z\s]+"

Explanation: The set [^a-zA-Z\s] matches any character except lower and capital alphabets and whitespaces.

Conclusion

This article discussed removing characters from a string using regular expressions in Python. The ten examples covered here present different cases of the problem. You can read more about Python regex in re documentation or practice writing regular expressions at https://regexr.com/ and https://regex101.com/.

Codeigo

Just programming

Remove Characters From String Using Regex and Python

Example 1: Removing the Last Character in the String

Example 2: Remove the First or the Last Word in the String

Example 3: Remove All or Specific Punctuation Marks

Example 4: Remove a Character or Series of Characters

Example 5: Remove Numbers

Example 6: Remove the First or the Last x Characters

Example 7: Remove All Whitespaces or Extra Spaces

Example 8: Remove Capital Letters

Example 9: Remove the First or Last Character in Every Word

Example 10: Remove Punctuation Marks and Numbers

Conclusion