NOTE: When I say the regex [\0]
I mean the regex [\0]
(not contained in a C-style string, which would then be "[\\0]"
). If I haven't put quotes around it, it's not a C-style string, and the backslashes shouldn't be interpreted as escaping a C-style string.
Inspired by this question and my investigation, I tried the following code in clang 3.4:
#include <regex>
#include <string>
int main()
{
std::string input = "foobar";
std::regex regex("[^\\0]*"); // Note, this is "\\0", not "\0"!
return std::regex_match(input, regex);
}
Apparently, clang doesn't like this, as it throws:
std::__1::regex_error
: The expression contained an invalid escaped character, or a trailing escape.
It seems to be the [^\0]
part (changing it to [^\n]
or something similar works fine). It seems to be an invalid escape character. I want to clarify that I'm not talking about the '\0'
character (null-character) or '\n'
character (newline character). In C-style strings, what I'm talking about is "\\0"
(a string containing backslash zero) and "\\n"
(a string containing backslash n). "\\n"
seems to get transformed into "\n"
by the regex engine, but it chokes on "\\0"
.
The C++11 standard says in section 28.13 [re.grammar] that:
The regular expression grammar recognized by basic_regex
objects constructed with the ECMAScript flag is that specified by ECMA-262, except as specified below.
I'm no expert on ECMA-262, but I tried the regular expression on JSFiddle and it's working fine there in JavaScript land.
So now I'm wondering if the regex [^\0]
is valid in ECMA-262 and the C++11 standard removed support for it (in the stuff following ... except as specified below.
).
Question: Is the \0
(not the null-character; in a string literal this would be "\\0"
) escape sequence legal in a C++11 regular expression? Is it legal in ECMA-262 (or are browser JS VMs just being "too" lenient)? What's the cause/justification for the different behaviors?