Question

I just asked a similar question to this one, and there was an excellent and accurate answer, but it turns out I now have a brand new problem. It turns out I have a single line of relevant input. I'm not sure how to ask this in an abstract way so I'll just jump right into my input:

(EDITED to provide a better example)

bear999bear888bear777bear666fox---bear222bear333bear444bear555fox

(The items between the markers are not necessarily numeric)

This is the expression (EDITED to match updated input example):

bear.*bear(?<matchString>(.(?!bear.*bear))*?)bear.*fox

It's returning 444. Is there a way that I can tweak this to return both 444 and 777? It seems to be skipping over the first match and favoring only the latter. I have the ! exclusion so that it matches only the innermost on the left side.

I've been testing here: http://regexlib.com/RETester.aspx

This works great when I break it into two lines and turn on multi-line. Why does it stop working when the input is on a single line?

Any advice would be appreciated!

Was it helpful?

Solution

This should work (it does work in that regex tester you've linked in the question):

(?<=bear)(?:(?!bear).)*(?=bear(?:(?!bear).)*fox)

It reads like "let's match something that is preceded by bear, has no bear sequence within, and is followed by the bear - no bear - fox sequence".

The capturing groups are absent here; the whole match is what you need.

And yes, I just can't help wondering why should this be done with a single regex when it actually looks like a work for a tokenizer. ) For example, you can split your line by 'fox' first, then split each part by 'bear' - and take the one before the last one of each result.

OTHER TIPS

Your first .* is greedy. This will work:

xxx.*?xxx.*?xxx(?<matchString>.*?)xxx.*?yyy
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top