swprintf chokes on characters outside 8-bit range

https://stackoverflow.com/questions/3085751

28-09-2019
|

Question

This happens on OS X, though I suspect it applies to any UNIX-y OS. I have two strings that look like this:

const wchar_t *test1 = (const wchar_t *)"\x44\x00\x00\x00\x73\x00\x00\x00\x00\x00\x00\x00";
const wchar_t *test2 = (const wchar_t *)"\x44\x00\x00\x00\x19\x20\x00\x00\x73\x00\x00\x00\x00\x00\x00\x00";

In the debugger, test1 looks like "Ds" and test2 looks like "D's" (with the curly apostrophe). I then call this code:

wchar_t buf1[100], buf2[100];
int ret1 = swprintf(buf1, 100, L"%ls", test1);
int ret2 = swprintf(buf2, 100, L"%ls", test2);

The first swprintf call works fine. The second one returns -1 (and the buffer is unchanged).

I'm guessing the problem has something to do with locales but googling around didn't provide me with anything useful. This is the simplest way to reproduce the problem I'm seeing. What I'm really interested in is vswprintf(), but I assume that's closely related.

Why does swprintf choke on the unicode character that is outside of the 8-bit range? Is there anyway to work around this?

Solution

Try explicitly set the locale to UTF-8.

setlocale(LC_CTYPE, "UTF-8");
...
const wchar_t* test2 = L"D\x2019s";
int ret2 = swprintf(buf2, 100, L"%ls", test2);
...

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow