Skip to content Skip to sidebar Skip to footer

Convert Full-width Unicode Characters Into Ascii Characters

I have some string text in unicode, containing some numbers as below: txt = '36fsdfdsf14' However, int(txt[:2]) does not recognize the characters as number. How to change

Solution 1:

If you actually have Unicode (or decode your byte string to Unicode) then you can normalize the data with a canonical replacement:

>>> s = u'36fsdfdsf14'>>> s
u'\uff13\uff16fsdfdsf\uff11\uff14'>>> import unicodedata as ud
>>> ud.normalize('NFKC',s)
u'36fsdfdsf14'

If canonical normalization changes too much for you, you can make a translation table of just the replacements you want:

#coding:utf8

repl = u'0123456789'# Fullwidth digits are U+FF10 to U+FF19.# This makes a lookup table from Unicode ordinal to the ASCII character equivalent.
xlat = dict(zip(range(0xff10,0xff1a),repl))

s = u'36fsdfdsf14'print(s.translate(xlat))

Output:

36fsdfdsf14

Solution 2:

On python 3

[int(x) for x in re.findall(r'\d+', '36fsdfdsf14')]
# [36, 14]

On python 2

[int(x) for x in re.findall(r'\d+', u'36fsdfdsf14', re.U)]
# [36, 14]

About python 2 example, notice the 'u' in front of string and re.U flag. You may convert existing str typed variable such as txt in your question to unicode as txt.decode('utf8').

Post a Comment for "Convert Full-width Unicode Characters Into Ascii Characters"