Pergunta

I need to match following statements:

Hi there John
Hi there John Doe (jdo)

Without matching these:

Hi there John Doe is here 
Hi there John is here

So I figured that this regexp would work:

^Hi there (.*)(?! is here)$

But it does not - and I am not sure why - I believe this may be caused by the capturing group (.*) so i thought that maybe making * operator lazy would solve the problem... but no. This regexp doesn't work too:

^Hi there (.*?)(?! is here)$

Can anyone point me in the solutions direction?

Solution

To retrieve sentence without is here at the end (like Hi there John Doe (the second)) you should use (author @Thorbear):

^Hi there (.*$)(?<! is here)

And for sentence that contains some data in the middle (like Hi there John Doe (the second) is here, John Doe (the second) being the desired data)simple grouping would suffice:

^Hi there (.*?) is here$

.

           ╔══════════════════════════════════════════╗
           ║▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒║
           ║▒▒▒Everyone, thank you for your replies▒▒▒║
           ║▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒║
           ╚══════════════════════════════════════════╝
Foi útil?

Solução

the .* will find a match regardless of being greedy, because at the end of the line, there is no following is here (naturally).

A solution to this could be to use lookbehind instead (checking from the end of the line, if the past couple of characters matches with is here).

^Hi there (.*)(?<! is here)$

Edit

As suggested by Alan Moore, further changing the pattern to ^Hi there (.*$)(?<! is here) will increase the performance of the pattern because the capturing group will then gobble up the rest of the string before attempting the lookbehind, thus saving you of unnecessary backtracking.

Outras dicas

It's not entirely clear from your example if you want to prevent " is here" from occurring anywhere or just at the end of a line. If it should not occur anywhere, try this:

^Hi there ((?! is here).)*$

(reFiddle example)

Before each character, it checks to see that the next characters are not " is here".

Alternatively, if you only want to exclude it if it occurs at the very end of a line, you could use a negative lookbehind as Thorbear suggested:

^Hi there (.*)(?<! is here)$ 

You're absolutely right why your expression matched all of the input lines. .* matched everything, and the lookahead (?! is here)$ would always be true because " is here" would never occur after the end of a line (because nothing will be there).

You don't need to solve your problem with regex, you merely need to use regex to find out if the non-intended regex matches. Of course, if you already know this and are simply looking to learn about lookaheads/lookbehinds, you can discard the rest of this answer.

If you take the regex you don't want your input strings to match:

badregex = (Hi there (.*)(is here))

This will give you a match for

Hi there, John is here

So you can just put the logic at application level, where it should be (logic in regexes is a bad bad thing). A bit of pseudocode (I cba write out Java right now, but you get the idea)

if (badregex.exactMatch(your_str))
   discardString();
   return;
if (goodregex.exactMatch(your_str))
   doStuff(your_str);
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top