If I've done my conversion correctly, 0x9834 in UTF-16 (16 bit
Unicode) translates to the three byte sequence 0xE9, 0xA0,
0xB4 in UTF-8 (8 bit Unicode). I don't know about other narrow
byte encodings, but I doubt any would be shorter than 2 bytes.
You pass a buffer of two bytes to wcstombs
, which means
a returned string of at most 1 bytes. wcstombs
stops
translating (without failing!) when there's no more room in the
destination buffer. You've also failed to L'\0'
terminate the
input buffer. It's not a problem at the moment, because
wcstombs
will stop translating before it gets there, but you
should normally add the extra L'\0'
.
So what to do:
First, and formost, when debugging this sort of thing, look at
the return value of wcstombs
. I'll bet that it's 0
, because
of the lack of space.
Second, I'd give myself a little bit of margin. Legal Unicode
can result in up to four bytes in UTF-8, so I'd allocate at
least 5 bytes for the output (don't forget the trailing '\0'
).
Along the same lines, you need a trailing L'\0'
for the input.
So:
char buffer[ 5 ];
wchar_t wc[] = { page->text[index].unicode, L'\0' };
int ret = wcstombs( buffer, wc, sizeof( buffer ) );
if ( ret < 1 ) { // And *not* 0
std::cerr << "OOPS\n";
}
std::string str( buffer, buffer + ret );
std::cout << str << '\n';
Of course, after all that, there is still the question of what
the (final) display device does with UTF-8 (or whatever the
multi-byte narrow character encoding is---UTF-8 is almost
universal under Unix, but I'm not sure about Windows.) But
since you say that displaying "\u9834"
seems to work, it
should be alright.