How to find a word using java regex syntax (not in java code) that is not preceded with another word with in 100 characters

StackOverflow https://stackoverflow.com/questions/23659388

  •  22-07-2023
  •  | 
  •  

Question

First off I have to use Java regex syntax (not inside of java code). I need to find a word lets say warning as long as it is not preceded by a word lets say see with in 100 characters. It can be a variation of warning. For example warnings and it must have a white space character, punctuation, or letter after it not any other character like ), /, or anything like that. This is what I have so far:

(?i)[^s&&e&&e].{0,100}(warning)/w*
Was it helpful?

Solution

Here are two examples with very similar logic. The only thing that changes is how we parse whether or not the word has "a white space character, punctuation, or letter after it not any other character like ), /, or anything like that". We can either exclude the characters we don't want (), /, etc) or require the characters we do want (white space, punctuation, etc).


Original:

This example uses a negative lookbehind to make sure warning is not preceded by see. It also uses word boundaries to determine whether see and warning[a-z]* are the whole word, or just a piece of the word. Finally we have a negative lookahead to make sure that warning[a-z]* isn't followed by our unwanted character class [)/].

(?<!       (?# start negative lookbehind)
  \bsee\b  (?# the word "see" surrounded by word boundaries)
)          (?# end negative lookbehind)
\s+        (?# 1+ whitespace characters separating words)
\b         (?# word boundary)
(          (?# start capture group)
  warning  (?# the word "warning")
  [a-z]*   (?# with optional additional characters)
)          (?# end capture group)
(?!        (?# start negative lookahead)
  [)/]     (?# character class of unwanted characters)
)          (?# end negative lookahead)
\b         (?# word boundary)

Minified: (?<!\bsee\b)\s+\b(warning[a-z]*)(?![)/])\b

Demo: Regex101


Alternate:

Alternatively, we can use a positive lookahead to match the characters we do want to follow warning[a-z]*. This would include a character class of things like [\s.,] OR the end of the string ($). Note that I removed the trailing word boundary, because this new lookahead will act as our word boundary.

(?<!       (?# start negative lookbehind)
  \bsee\b  (?# the word "see" surrounded by word boundaries)
)          (?# end negative lookbehind)
\s+        (?# 1+ whitespace characters separating words)
\b         (?# word boundary)
(          (?# start capture group)
  warning  (?# the word "warning")
  [a-z]*   (?# with optional additional characters)
)          (?# end capture group)
(?=        (?# start lookahead)
  [\s.,]   (?# character class of allowed characters)
 |         (?# OR)
  $        (?# the end of string)
)          (?# end negative lookahead)

Minified: (?<!\bsee\b)\s+\b(warning[a-z]*)(?=[\s.,]|$)

Demo: Regex101

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top