The wildcard name comes from a card game, where a single card can represent any other card. The wildcard metacharacter is similar. It is represented by a dot (.) and matches any character, except for a new line character (\n).
For example, if we have a RegEx:
/s.n/
It matches: son, sun, but not soon, seen.
It will also match characters, such as space or dot: s n, s.n.
This metacharacter represents only a single character inside a string.
This is what the Python implementation looks like:
1 2 3 4 5 6 7 8 |
import re myregex = re.compile('s.n') mylist = ['son', 'sun', 'soon', 'seen'] for element in mylist: if re.match(myregex, element): print(element) |
If you run the code, you will get this result:
son sun
Most common mistake
There is a common mistake that people make using the wildcard character.
If you work with decimal fractions, you may want to match the following RegEx:
/5.40/
It will match 5.40, but also 5 40, 5_40, 5-40, 5740, etc.
1 2 3 4 5 6 7 8 |
import re myregex = re.compile('5.40') mylist = ['5.40', '5 40', '5_40', '5-40', '5740'] for element in mylist: if re.match(myregex, element): print(element) |
Result:
5.40 5 40 5_40 5-40 5740
A good regular expression is when you match the type of text you want to match, and only this type of text, nothing more.
If you want to escape metacharacter, you have to use another metacharacter, called backslash (\).
When you escape metacharacter, you tell the RegEx engine that the character that follows should be treated as a literal character.
Now, you can modify it:
/5\.40/
This time, the RegEx engine matches only 5.40.
1 2 3 4 5 6 7 8 |
import re myregex = re.compile('5\.40') mylist = ['5.40', '5 40', '5_40', '5-40', '5740'] for element in mylist: if re.match(myregex, element): print(element) |
Result:
5.40