The problem is the back reference (\1
). Back references are evil, or at least very difficult to implement in the general case, and it's not easy to recognize not-general cases.
In your case, the problem is that the regex's first match will be from the first === TEST1 ===
to the last === END TEST1 ===
. That's not what you intended, but it is the way regexes work. (The "longest left-most rule".) In theory, it's still possible to match the regex without killing the stack, but I doubt whether the regex library you're using is clever enough to make that optimization.
You can fix the regex to match what you want it to match by making the data part (((?:.|\\n)*)
) non-greedy: change it to ((?:.|\\n)*?)
. That might also fix the stack blow-up problem, because it will cause the regex to match much earlier, before it blows up the stack. But I don't know if it will work in general; I really don't know anything about the MS implementation.
In my opinion, you should avoid back references, even though it means complicating your code a bit. What I would do is to first match:
=== ([^=]+) ===\n
and then create the terminating string:
"\n=== END " + match[1].str() + " ==="
and then find()
the terminating string. That means you can no longer use the regex library's iterator, which is unfortunate, but the loop is still pretty straight-forward.
By the way, I find it odd that you only recognize the start delimiter if it is at the end of a line, and the end delimiter if it is at the start of a line. My inclination would have been to require both of them to be full lines. If you replace the regex-with-back-reference with my two-step approach, it's relatively easy to accomplish that. That might be considered another hint that the regex-with-back-reference is not really the right approach.