If the regex engine you use supports it, you can do it with a negative lookahead and a backreference:
(foo|bar|baz).*(?!\1)(foo|bar|baz)
(?!\1)
means "not followed by the one in the first capturing group".
To not repeat twice the list a pcre regex engine offer different syntax:
(foo|bar|baz).*(?!\1)(?1)
(foo|bar|baz).*(?!\g{1})\g<1>
(?<list>foo|bar|baz).*(?!\g{list})\g<list>
(?(DEFINE)(?<list>foo|bar|baz))(\g<list>).*(?!\1)\g<list>
with Ruby:
(foo|bar|baz).*(?!\k<1>)\g<1>
(?<list>foo|bar|baz).*(?!\k<list>)\g<list>
(?<list>foo|bar|baz){0}\g<list>.*(?!\k<list>)\g<list>
But if the regex engine doesn't have a feature to reuse a subpattern, you can try this pattern (works with pcre, Python re module, Java, .NET, Ruby but not with Javascript nor XRegExp):
(?:(?!\1)(foo|bar|baz).*){2}
Explanation:
At the begining (the first time) the capturing group is not defined and the backreference \1
too. The regex engine ignores the lookahead condition (note that this means that the regex engine does not consider (?!\1)
as (?!)
, but choose to skip the test!). Then the first word in the list is captured and the second time the backreference \1
is now defined and the lookahead makes its job.
For R language, you can make it work using the param perl=TRUE
and escaping the backslash (as in Java):
(?:(?!\\1)(foo|bar|baz).*){2}