In general this approach can be implemented endianness agnostic, since UTF-32 is only used on a system with the same endianness, while in every case where it interfaces with a system that may have different endianness UTF-8 is used - and UTF-8 is built on a byte stream (therefore there is no endianness).
However, the conversion itself is endian-sensitive and must be implemented correctly so that endianness does not become a problem (e.g. no memcopy
but arithmetic shifts instead). It should be reasonable to assume that your standard library implementation does this conversion correctly.
To add some clarification as to why this code shall be unaffected by endianess (22.5/4):
For the facet
codecvt_utf8
:
- The facet shall convert between UTF-8 multibyte sequences and UCS2 or UCS4 (depending on the size ofElem
) within the program.
- Endianness shall not affect how multibyte sequences are read or written.
- The multibyte sequences may be written as either a text or a binary file.
The endianess
member of the codecvt_mode
enumeration type is only intended for reading/writing UTF-16 and UTF-32 multibyte sequences.