2-byte (UCS-2) wide strings under GCC

https://stackoverflow.com/questions/2790412

04-10-2019
|

문제

when porting my Visual C++ project to GCC, I found out that the wchar_t datatype is 4-byte UTF-32 by default. I could override that with a compiler option, but then the whole wcs* (wcslen, wcscmp, etc.) part of RTL is rendered unusable, since it assumes 4-byte wide strings.

For now, I've reimplemented 5-6 of these functions from scratch and #defined my implementations in. But is there a more elegant option - say, a build of GCC RTL with 2-byte wchar-t quietly sitting somewhere, waiting to be linked?

The specific flavors of GCC I'm after are Xcode on Mac OS X, Cygwin, and the one that comes with Debian Linux Etch.

해결책 4

Reimplemented 5-6 of more common wcs* functions, #defined my implementations in.

다른 팁

But is there a more elegant option - say, a build of GCC RTL with 2-byte wchar-t quietly sitting somewhere, waiting to be linked?

No. This is a platform-specific issue, not a GCC issue.

That is to say, the Linux platform ABI specifies that wchar_t is 32-bits wide, so either you have to use a whole new library (for which ICU is a popular choice), or port your code to handle 4-byte wchar_ts. All libraries that you might link to will also assume a 4-byte wchar_t, and will break if you use GCC's -fshort-wchar.

But on Linux specifically, nearly everyone has standardized on UTF-8 for all multibyte encodings.

Look at the ICU library. It is a portable library with a UTF-16 API.

As you've noticed, wchar_t is implementation defined. There is no way to portable work with that data type.

Linux systems in general had the advantage of gaining Unicode support later, after the whole UCS-2 debacle was declared a not-so-great idea, and use UTF-8 as the encoding. All system APIs still operate on char*, and are Unicode safe.

Your best bets are to use a library which manages this for you: Qt, ICU, etc.

Note that cygwin features a 2 byte wchar_t to make meshing with Windows easier.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow