Pergunta

I didn't find an explanation in the C standard how do aforementioned escape sequences in wide strings are processed.

For example:

wchar_t *txt1 = L"\x03A9";
wchar_t *txt2 = L"\xA9\x03";

Are these somehow processed (like prefixing each byte with \x00 byte) or stored in memory exactly the same way as they are declared here?

Also, how does L prefix operate according to the standard?

EDIT:

Let's consider txt2. How it would be stored in memory? \xA9\x00\x03\x00 or \xA9\x03 as it was written? Same goes to \x03A9. Would this be considered as a wide character or as 2 separate bytes which would be made into two wide characters?

EDIT2:

Standard says:

The hexadecimal digits that follow the backslash and the letter x in a hexadecimal escape sequence are taken to be part of the construction of a single character for an integer character constant or of a single wide character for a wide character constant. The numerical value of the hexadecimal integer so formed specifies the value of the desired character or wide character.

Now, we have a char literal:

wchar_t txt = L'\xFE\xFF';

It consists of 2 hex escape sequences, therefore it should be treated as two wide characters. If these are two wide characters they can't fit into one wchar_t space (yet it compiles in MSVC) and in my case this sequence is treated as the following:

wchar_t foo = L'\xFFFE';

which is the only hex escape sequence and therefore the only wide char.

EDIT3:

Conclusions: each oct/hex sequence is treated as a separate value ( wchar_t *txt2 = L"\xA9\x03"; consists of 3 elements). wchar_t txt = L'\xFE\xFF'; is not portable - implementation defined feature, one should use wchar_t txt = L'\xFFFE';

Foi útil?

Solução

There's no processing. L"\x03A9" is simply an array wchar_t const[2] consisting of the two elements 0x3A9 and 0, and similarly L"\xA9\x03" is an array wchar_t const[3].

Note in particular C11 6.4.4.4/7:

Each octal or hexadecimal escape sequence is the longest sequence of characters that can constitute the escape sequence.

And also C++11 2.14.3/4:

There is no limit to the number of digits in a hexadecimal sequence.

Note also that when you are using a hexadecimal sequence, it is your responsibility to ensure that your data type can hold the value. C11-6.4.4.4/9 actually spells this out as a requirement, whereas in C++ exceeding the type's range is merely "implementation-defined". (And a good compiler should warn you if you exceed the type's range.)


Your code doesn't make sense, though, because the left-hand sides are neither arrays nor pointers. It should be like this:

wchar_t const * p = L"\x03A9";    // pointer to the first element of a string

wchar_t arr1[] = L"\x03A9";       // an actual array
wchar_t arr2[2] = L"\x03A9";      // ditto, but explicitly typed

std::wstring s = L"\x03A9";       // C++ only

On a tangent: This question of mine elaborates a bit on string literals and escape sequences.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top