Microsoft's implementation of std::regex

Question 1

The problem is the back reference (\1). Back references are evil, or at least very difficult to implement in the general case, and it's not easy to recognize not-general cases.

In your case, the problem is that the regex's first match will be from the first === TEST1 === to the last === END TEST1 ===. That's not what you intended, but it is the way regexes work. (The "longest left-most rule".) In theory, it's still possible to match the regex without killing the stack, but I doubt whether the regex library you're using is clever enough to make that optimization.

You can fix the regex to match what you want it to match by making the data part (((?:.|\\n)*)) non-greedy: change it to ((?:.|\\n)*?). That might also fix the stack blow-up problem, because it will cause the regex to match much earlier, before it blows up the stack. But I don't know if it will work in general; I really don't know anything about the MS implementation.

In my opinion, you should avoid back references, even though it means complicating your code a bit. What I would do is to first match:

 === ([^=]+) ===\n

and then create the terminating string:

 "\n=== END " + match[1].str() + " ==="

and then find() the terminating string. That means you can no longer use the regex library's iterator, which is unfortunate, but the loop is still pretty straight-forward.

By the way, I find it odd that you only recognize the start delimiter if it is at the end of a line, and the end delimiter if it is at the start of a line. My inclination would have been to require both of them to be full lines. If you replace the regex-with-back-reference with my two-step approach, it's relatively easy to accomplish that. That might be considered another hint that the regex-with-back-reference is not really the right approach.

Question 2

Forget <regex> – at least for now, potentially for good. In my opinion, the spec is broken and unusable; but even if it isn’t, at least current implementations are, and probably will be for years to come.

This is because all major vendors implement their own regex engines from scratch instead of relying on existing, tried and tested libraries. This is a huge endeavour.

My recommendation: Use another regex library for now and give <regex> a wide berth. Alternatives are Boost.Regex, Boost.Xpressive and (C-style) libraries such as PCRE or Oniguruma.

Incidentally, we had a discussion about this very topic today in the chat. If you’ve got half an hour, you can read my detailed rant and some interesting counter-points.