Skip to content Skip to sidebar Skip to footer

Regex With Unicode And Str

I have a list of regex and a replace function. regex function replacement_patterns = [(ur'\\u20ac', ur' euros'),(ur'\xe2\x82\xac', r' euros'),(ur'\b[eE]?[uU]?[rR]\b', r' euros'

Solution 1:

Maybe you need something like this

importreregex= re.compile("^http://.+", re.UNICODE)

And if you need more than one, you can do like this

regex = re.compile("^http://.+", re.UNICODE | re.IGNORECASE)

Get the example

>>>r = re.compile("^http://.+", re.UNICODE | re.IGNORECASE)>>>r.match('HTTP://ыыы')
<_sre.SRE_Match object at 0x7f572455d648>

Does it correct result?

>>>classRegexpReplacer(object):...def__init__(self, patterns=replacement_patterns):...        self.patterns = [(re.compile(regex, re.UNICODE | re.IGNORECASE), repl) for   (regex, repl) in patterns]...defreplace(self, text):...        s = text...for (pattern, repl) in self.patterns:...            (s, count) = re.subn(pattern, repl, s)...return s...>>>string='730\u20ac.\r\n\n ropa surf  ... 5,10 muy buen estado..... 170 \u20ac\r\n\nPack 850\u20ac, reparaci\u00f3n. \r\n\n'>>>replacer = RegexpReplacer()>>>texto= replacer.replace(string)>>>texto
u'730  euros.\r\n\n ropa surf  ... 5,10 muy buen estado..... 170   euros\r\n\nPack 850  euros, reparaci\\u00f3n. \r\n\n'

Solution 2:

If you want Unicode replacement patterns, you need also be operating on Unicode strings. JSON should be returning Unicode as well.

Change the following by removing \\ and removing UTF-8 (won't see in a Unicode string). Also you compile with IGNORE_CASE so no need for [eE], etc.:

replacement_patterns = [(ur'\u20ac', ur'  euros'),(ur'\be?u?r\b', r'  euros'), (ur'\b([0-9]+)eu?r?o?s?\b',ur' \1 euros')]

Make the following a Unicode string (add u):

string = u'730\u20ac.\r\n\n ropa surf  ... 5,10 muy buen estado..... 170 \u20ac\r\n\nPack 850\u20ac, reparaci\u00f3n. \r\n\n'

Then it should operator on Unicode JSON as well.

Post a Comment for "Regex With Unicode And Str"