Skip to content Skip to sidebar Skip to footer

Python Regex Look Ahead Positive + Negative

This regex will get 456. My question is why it CANNOT be 234 from 1-234-56 ? Does 56 qualify the (?!\d)) pattern since it is NOT a single digit. Where is the beginning point that

Solution 1:

My question is why it CANNOT be 234 from 1-234-56?

It is not possible as (?=(\d{3})+(?!\d)) requires 3-digit sequences appear after a 1-3-digit sequence. 56 (the last digit group in your imagined scenario) is a 2-digit group. Since a quantifier can be either lazy or greedy, you cannot match both one, two and three digit groups with \d{1,3}. To get 234 from 123456, you'd need a specifically tailored regex for it: \B\d{3}, or (?<=1)\d{3} or even \d{3}(?=\d{2}(?!\d)).

Does 56 match the (?!\d)) pattern? Where is the beginning point that (?!\d)) will look for?

No, this is a negative lookahead, it does not match, it only checks if there is no digit right after the current position in the input string. If there is a digit, the match is failed (not result found and returned).

More clarification on the look-ahead: it is located after (\d{3})+ subpattern, thus the regex engine starts searching for a digit right after the last 3-digit group, and fails a match if the digit is found (as it is a negative lookahead). In plain words, the (?!\d) is a number closing/trailing boundary in this regex.

A more detailed breakdown:

  • \d{1,3} - 1 to 3 digit sequence, as many as possible (greedy quantifier is used)
  • (?=(\d{3})+(?!\d)) - a positive look-ahead ((?=...)) that checks if the 1-3 digit sequence matched before are followed by
    • (\d{3})+ - 1 or more (+) sequences of exactly 3 digits...
    • (?!\d) - not followed by a digit.

Lookaheads do not match, do not consume characters, but you still can capture inside them. When a lookahead is executed, the regex index is at the same character as before. With your regex and input, you match 123 with \d{1,3} as then you have 3-digit sequence (456). But 456 is capured within a lookahead, and re.findall returns only captured texts if capturing groups are set.

To just add comma as digit grouping symbol, use

rx = r'\d(?=(?:\d{3})+(?!\d))'

See IDEONE demo

Post a Comment for "Python Regex Look Ahead Positive + Negative"