Domanda

This is a follow-up to Regular expression which matches at least two words from a list:

How do I write a regexp which would match at least two different words from a list?

E.g., given the list "foo", "bar", "baz", I would like the regexp to match "foo..bar" but not "foo..foo" and "z baz ".

Just like in the original question, I would like to avoid repeating the word list in the regexp (what if my blacklist has length of 30 instead of 3 as in the example?)

È stato utile?

Soluzione

If the regex engine you use supports it, you can do it with a negative lookahead and a backreference:

(foo|bar|baz).*(?!\1)(foo|bar|baz)

(?!\1) means "not followed by the one in the first capturing group".

To not repeat twice the list a pcre regex engine offer different syntax:

(foo|bar|baz).*(?!\1)(?1)

(foo|bar|baz).*(?!\g{1})\g<1>

(?<list>foo|bar|baz).*(?!\g{list})\g<list>

(?(DEFINE)(?<list>foo|bar|baz))(\g<list>).*(?!\1)\g<list>

with Ruby:

(foo|bar|baz).*(?!\k<1>)\g<1>

(?<list>foo|bar|baz).*(?!\k<list>)\g<list>

(?<list>foo|bar|baz){0}\g<list>.*(?!\k<list>)\g<list>

But if the regex engine doesn't have a feature to reuse a subpattern, you can try this pattern (works with pcre, Python re module, Java, .NET, Ruby but not with Javascript nor XRegExp):

(?:(?!\1)(foo|bar|baz).*){2}

Explanation:

At the begining (the first time) the capturing group is not defined and the backreference \1 too. The regex engine ignores the lookahead condition (note that this means that the regex engine does not consider (?!\1) as (?!), but choose to skip the test!). Then the first word in the list is captured and the second time the backreference \1 is now defined and the lookahead makes its job.

For R language, you can make it work using the param perl=TRUE and escaping the backslash (as in Java):

(?:(?!\\1)(foo|bar|baz).*){2}
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top