Python Regex Look Ahead Positive + Negative
Solution 1:
My question is why it CANNOT be
234
from1-234-56
?
It is not possible as (?=(\d{3})+(?!\d))
requires 3-digit sequences appear after a 1-3-digit sequence. 56
(the last digit group in your imagined scenario) is a 2-digit group. Since a quantifier can be either lazy or greedy, you cannot match both one, two and three digit groups with \d{1,3}
. To get 234
from 123456
, you'd need a specifically tailored regex for it: \B\d{3}
, or (?<=1)\d{3}
or even \d{3}(?=\d{2}(?!\d))
.
Does
56
match the(?!\d))
pattern? Where is the beginning point that (?!\d)) will look for?
No, this is a negative lookahead, it does not match, it only checks if there is no digit right after the current position in the input string. If there is a digit, the match is failed (not result found and returned).
More clarification on the look-ahead: it is located after (\d{3})+
subpattern, thus the regex engine starts searching for a digit right after the last 3-digit group, and fails a match if the digit is found (as it is a negative lookahead). In plain words, the (?!\d)
is a number closing/trailing boundary in this regex.
A more detailed breakdown:
\d{1,3}
- 1 to 3 digit sequence, as many as possible (greedy quantifier is used)(?=(\d{3})+(?!\d))
- a positive look-ahead ((?=...)
) that checks if the 1-3 digit sequence matched before are followed by(\d{3})+
- 1 or more (+
) sequences of exactly 3 digits...(?!\d)
- not followed by a digit.
Lookaheads do not match, do not consume characters, but you still can capture inside them. When a lookahead is executed, the regex index is at the same character as before. With your regex and input, you match 123
with \d{1,3}
as then you have 3-digit sequence (456
). But 456
is capured within a lookahead, and re.findall
returns only captured texts if capturing groups are set.
To just add comma as digit grouping symbol, use
rx = r'\d(?=(?:\d{3})+(?!\d))'
See IDEONE demo
Post a Comment for "Python Regex Look Ahead Positive + Negative"