Convert Full-width Unicode Characters Into Ascii Characters

March 31, 2024 Post a Comment

I have some string text in unicode, containing some numbers as below: txt = '３６fsdfdsf１４' However, int(txt[:2]) does not recognize the characters as number. How to change

Solution 1:

If you actually have Unicode (or decode your byte string to Unicode) then you can normalize the data with a canonical replacement:

>>> s = u'３６fsdfdsf１４'>>> s
u'\uff13\uff16fsdfdsf\uff11\uff14'>>> import unicodedata as ud
>>> ud.normalize('NFKC',s)
u'36fsdfdsf14'

If canonical normalization changes too much for you, you can make a translation table of just the replacements you want:

#coding:utf8

repl = u'0123456789'# Fullwidth digits are U+FF10 to U+FF19.# This makes a lookup table from Unicode ordinal to the ASCII character equivalent.
xlat = dict(zip(range(0xff10,0xff1a),repl))

s = u'３６fsdfdsf１４'print(s.translate(xlat))

Output:

36fsdfdsf14

Solution 2:

On python 3

[int(x) for x in re.findall(r'\d+', '３６fsdfdsf１４')]
# [36, 14]

On python 2

[int(x) for x in re.findall(r'\d+', u'３６fsdfdsf１４', re.U)]
# [36, 14]

About python 2 example, notice the 'u' in front of string and re.U flag. You may convert existing str typed variable such as txt in your question to unicode as txt.decode('utf8').

Python Developer

Convert Full-width Unicode Characters Into Ascii Characters

Solution 1:

Solution 2:

Post a Comment for "Convert Full-width Unicode Characters Into Ascii Characters"