Convert 16 bits in memory into std::string

Question 1

If I've done my conversion correctly, 0x9834 in UTF-16 (16 bit Unicode) translates to the three byte sequence 0xE9, 0xA0, 0xB4 in UTF-8 (8 bit Unicode). I don't know about other narrow byte encodings, but I doubt any would be shorter than 2 bytes. You pass a buffer of two bytes to wcstombs, which means a returned string of at most 1 bytes. wcstombs stops translating (without failing!) when there's no more room in the destination buffer. You've also failed to L'\0' terminate the input buffer. It's not a problem at the moment, because wcstombs will stop translating before it gets there, but you should normally add the extra L'\0'.

So what to do:

First, and formost, when debugging this sort of thing, look at the return value of wcstombs. I'll bet that it's 0, because of the lack of space.

Second, I'd give myself a little bit of margin. Legal Unicode can result in up to four bytes in UTF-8, so I'd allocate at least 5 bytes for the output (don't forget the trailing '\0'). Along the same lines, you need a trailing L'\0' for the input. So:

char buffer[ 5 ];
wchar_t wc[] = { page->text[index].unicode, L'\0' };
int ret = wcstombs( buffer, wc, sizeof( buffer ) );
if ( ret < 1 ) {    //  And *not* 0
    std::cerr << "OOPS\n";
}
std::string str( buffer, buffer + ret );
std::cout << str << '\n';

Of course, after all that, there is still the question of what the (final) display device does with UTF-8 (or whatever the multi-byte narrow character encoding is---UTF-8 is almost universal under Unix, but I'm not sure about Windows.) But since you say that displaying "\u9834" seems to work, it should be alright.

Question 2

Please read a bit about what "character encoding" means, like this: What is character encoding and why should I bother with it

Then figure out what encoding you are getting in, and what encoding you need to use on the output. That means figuring out what your file format / GUI library / console is expecting.

Then use something reliable like libiconv to convert between them, instead of the so-implementation-defined-that-is-almost-useless wcstombs()+wchar_t.

For example, you might find that your input is UCS-2, and you need to output it into UTF-8. My system has 32-bit wchar_t, I wouldn't count on it converting from UCS-2 to UTF-8.

Question 3

To convert from UTF-16 to UTF-8, use codecvt_utf8<char16_t>:

#include <iostream>
#include <string>
#include <locale>
#include <codecvt>

int main() {
    char16_t wstr16[2] = {0x266A, 0};
    auto conv = std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t>{};
    auto u8str = std::string{conv.to_bytes(wstr16)};
    std::cout << u8str << '\n';
}