You can convert Unicode characters to ASCII string using the encode function.
mytext = "Klüft électoral große" myresult = mytext.encode('ascii', 'ignore') print(myresult)
All values that are not ASCII characters will be ignored.
b'Klft lectoral groe'
In the encode function, there is a second parameter. In this case, it’s ignoring characters that don’t meet the requirement.
There are also different parameters, for example, replace. In this case, Python inputs question marks, instead of removing the characters, so the result consists of the same amount of characters as the entry string.
The new code looks like this:
mytext = "Klüft électoral große" myresult = mytext.encode('ascii', 'replace') print(myresult)
And this is the result.
b'Kl?ft ?lectoral gro?e'
There is also an option to convert characters to the closest equivalent from ASCII.
For this purpose, we are going to use the normalize function. There are also a few parameters, you can use, but for this demonstration, I’m going to use only one: NFKD.
This is how the code looks like:
import unicodedata mytext = "Klüft électoral große" myresult = unicodedata.normalize('NFKD', mytext).encode('ascii', 'ignore') print(myresult)
Here’s the result:
b'Kluft electoral groe'
Convert ß to ss
In this case, the sharp S (ß) was not converted to “ss”, but rather ignored. We can quickly fix that by adding the replace function to mytext variable. It has to be replaced before the normalize function.
mytext = "Klüft électoral große".replace('ß', 'ss')
Now, when you run the code the sharp S is not lost.
b'Kluft electoral grosse'
ASCII and UTF-8
Instead of ASCII, you can also use the UTF-8 encoding.
mytext = "Klüft électoral große" myresult = mytext.encode('utf-8') print(myresult)
This is how the result looks like:
b'Kl\xc3\xbcft \xc3\xa9lectoral gro\xc3\x9fe'