Regex With Unicode And Str
I have a list of regex and a replace function. regex function replacement_patterns = [(ur'\\u20ac', ur' euros'),(ur'\xe2\x82\xac', r' euros'),(ur'\b[eE]?[uU]?[rR]\b', r' euros'
Solution 1:
Maybe you need something like this
importreregex= re.compile("^http://.+", re.UNICODE)
And if you need more than one, you can do like this
regex = re.compile("^http://.+", re.UNICODE | re.IGNORECASE)
Get the example
>>>r = re.compile("^http://.+", re.UNICODE | re.IGNORECASE)>>>r.match('HTTP://ыыы')
<_sre.SRE_Match object at 0x7f572455d648>
Does it correct result?
>>>classRegexpReplacer(object):...def__init__(self, patterns=replacement_patterns):... self.patterns = [(re.compile(regex, re.UNICODE | re.IGNORECASE), repl) for (regex, repl) in patterns]...defreplace(self, text):... s = text...for (pattern, repl) in self.patterns:... (s, count) = re.subn(pattern, repl, s)...return s...>>>string='730\u20ac.\r\n\n ropa surf ... 5,10 muy buen estado..... 170 \u20ac\r\n\nPack 850\u20ac, reparaci\u00f3n. \r\n\n'>>>replacer = RegexpReplacer()>>>texto= replacer.replace(string)>>>texto
u'730 euros.\r\n\n ropa surf ... 5,10 muy buen estado..... 170 euros\r\n\nPack 850 euros, reparaci\\u00f3n. \r\n\n'
Solution 2:
If you want Unicode replacement patterns, you need also be operating on Unicode strings. JSON should be returning Unicode as well.
Change the following by removing \\
and removing UTF-8 (won't see in a Unicode string). Also you compile with IGNORE_CASE so no need for [eE]
, etc.:
replacement_patterns = [(ur'\u20ac', ur' euros'),(ur'\be?u?r\b', r' euros'), (ur'\b([0-9]+)eu?r?o?s?\b',ur' \1 euros')]
Make the following a Unicode string (add u
):
string = u'730\u20ac.\r\n\n ropa surf ... 5,10 muy buen estado..... 170 \u20ac\r\n\nPack 850\u20ac, reparaci\u00f3n. \r\n\n'
Then it should operator on Unicode JSON as well.
Post a Comment for "Regex With Unicode And Str"