Convert Full-width Unicode Characters Into Ascii Characters
I have some string text in unicode, containing some numbers as below: txt = '36fsdfdsf14' However, int(txt[:2]) does not recognize the characters as number. How to change
Solution 1:
If you actually have Unicode (or decode your byte string to Unicode) then you can normalize the data with a canonical replacement:
>>> s = u'36fsdfdsf14'>>> s
u'\uff13\uff16fsdfdsf\uff11\uff14'>>> import unicodedata as ud
>>> ud.normalize('NFKC',s)
u'36fsdfdsf14'
If canonical normalization changes too much for you, you can make a translation table of just the replacements you want:
#coding:utf8
repl = u'0123456789'# Fullwidth digits are U+FF10 to U+FF19.# This makes a lookup table from Unicode ordinal to the ASCII character equivalent.
xlat = dict(zip(range(0xff10,0xff1a),repl))
s = u'36fsdfdsf14'print(s.translate(xlat))
Output:
36fsdfdsf14
Solution 2:
On python 3
[int(x) for x in re.findall(r'\d+', '36fsdfdsf14')]
# [36, 14]
On python 2
[int(x) for x in re.findall(r'\d+', u'36fsdfdsf14', re.U)]
# [36, 14]
About python 2 example, notice the 'u' in front of string and re.U
flag. You may convert existing str
typed variable such as txt
in your question to unicode as txt.decode('utf8')
.
Post a Comment for "Convert Full-width Unicode Characters Into Ascii Characters"