Regex- Ignore a constant string that matches a pattern

Question

I have this regular expression:

\b[A-Z]{1}[A-Z]{0,7}[0-9]?\b|\b[0-9]{2,3}\b

The desired output is as highlighted:

JOHN went to LONDON one fine day. JOHN had lunch in a PUB. JOHN then moved to CHICAGO. I don't want JOHN to be highlighted. John does not want this to match the pattern. Neither this. But THIS1 should match the pattern. Also the other 70 times that the pattern should match.

Observed output:

JOHN went to LONDON one fine day. JOHN had lunch in a PUB. JOHN then moved to CHICAGO. I don't want JOHN to be highlighted. John does not want this to match the pattern. Neither this. But THIS1 should match the pattern. Also the other 70 times that the pattern should match.

The regex works partly but I don't want two constant strings- JOHN and I to match as part of this regex. Please help.


Show source
| excel-vba   | regex   2017-01-05 12:01 1 Answers

Answers to Regex- Ignore a constant string that matches a pattern ( 1 )

  1. 2017-01-05 12:01

    You can use a negative lookahead to exclude those matches. Also, your pattern seems rather "redundant", you may shorten it considerably using grouping and removing unnecessary subpatterns:

    \b(?!(?:JOHN|I)\b)(?:[A-Z]{1,8}[0-9]?|[0-9]{2,3})\b
      ^^^^^^^^^^^^^^^^
    

    See the regex demo

    The (?!(?:JOHN|I)\b) is the negative lookahead that fails the match if the word matched is equal to I or JOHN.

    Note that {1} can always be omitted as any unquantified pattern is matched once. [A-Z]{1}[A-Z]{0,7} is actually equal to [A-Z]{1,8}.

    Pattern details:

    • \b - word boundary
    • (?!(?:JOHN|I)\b) - the word matched cannot be equal to JOHN or I
    • (?:[A-Z]{1,8}[0-9]?|[0-9]{2,3}) - one of the two alternatives:
      • [A-Z]{1,8}[0-9]? - 1 to 8 uppercase ASCII letters followed with an optional (1 or 0) digit
      • | - or
      • [0-9]{2,3} - 2 to 3 digits
    • \b - trailing word boundary

Leave a reply to - Regex- Ignore a constant string that matches a pattern

◀ Go back