Pergunta

when I tried this regex

\"(\S\S+)\"(?!;c)

on this string "MM:";d it comes as matched as I wanted

and on this string "MM:";c it comes as not matched as desired.

But when I add a second group, by moving the semicolon inside that group and making it optional using |

\"(\S\S+)\"(;|)(?!c)

for this string "MM:";c it comes as matched when I expected it to not like before.

I tried this on Java and then on Javascript using Regex tool debuggex:

This link contains a snippet of the above

What am I doing wrong?

note the | is so it is not necessary to have a semicolon.Also in the examples I put c, it is just a substitute in the example for a word, that's why I am using negative lookahead.

After following Holgers response of using Possessive Quantifiers,

\"(\S\S+)\";?+(?!c)

it worked, here is a link to it on RegexPlanet

Foi útil?

Solução 2

The problem is that you don’t want to make the semicolon optional in the sense of regular expression. An optional semicolon implies that the matcher is allowed to try both, matching with or without it. So even if the semicolon is there the matcher can ignore it creating an empty match for the group letting the lookahead succeed.

But you want to consume the semicolon if it’s there, so it is not allowed to be used to satisfy the negative look-ahead. With Java’s regex engine that’s pretty easy: use ;?+

This is called a “possessive quantifier”. Like with the ? the semicolon doesn’t need to be there but if it’s there it must match and cannot be ignored. So the regex engine has no alternatives any more.

So the entire pattern looks like \"(\S\S+)\";?+(?!c) or \"(\S\S+)\"(;?+)(?!c) if you need the semicolon in a group.

Outras dicas

I believe that the regex will do what it can to find a match; since your expression said the semicolon could be optional, it found that it could match the entire expression (since if the semicolon is not consumed by the first group, it becomes a "no-match" for the negative lookahead. This has to do with the recursive way that regex works: it keeps trying to find a match...

In other words, the process goes like this:

MM:" - matched
(;|) - try semicolon? matched
(?!c) - oops - negative lookahead fails. No match. Go back
(;|)  - try nothing. We still have ';c' left to match
(?!c) - negative lookahead not matched. We have a match 

An update (based on your comment). The following code may work better:

\"(\S\S+)\"(;|)((?!c)|(?!;c))

Regular expression visualization

Debuggex Demo

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top