why does iconv(1) in cygwin produce big-endian UTF-16 with `-t utf-16`?

https://stackoverflow.com/questions/21450205

04-10-2022
|

题

On cygwin 1.7.25 with libiconv 1.14-2, iconv(1) will produce big-endian UTF-16 (with BOM) when used with iconv -t utf-16 even though x86 is little endian (and windows produces little endian UTF-16). Isn't libiconv supposed to use platform-dependent endianness for the default utf-16 conversion? It's not necessarily a problem for the apps I am using (since they can handle both, by reading BOM), but still peculiar behavior: edit a new file with notepad. It will save as utf-16le with bom, run it through iconv(1) on the same system -t utf-16 and you get a reordered file (with big-endian bom).

解决方案

The Unicode specification indicates a preference for big endian and often non-Microsoft software will use that by default. In particular when UTF-16 is encoded without a BOM, and in the absence of a higher level protocol (such as the medium declaring a byte order, as with networks and network byte orders), the byte order is big endian. However, some software does not adhere to the specification and assumes little endian when there is no BOM, so adding a BOM may be done to allow such software to work.

Isn't libiconv supposed to use platform-dependent endianness for the default utf-16 conversion?

Not as far as I know. What makes you think this?

其他提示

This isn't quite a duplicate, but the accepted answer to Convert UTF8 to UTF16 using iconv proposes a simple and scriptable workound, to specify an explicit endianness and then prepend a BOM:

( printf "\xff\xfe" ; iconv -f utf-8 -t utf-16le UTF-8-FILE ) > UTF-16-FILE

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow