Question

I need to convert utf16 text to utf8. The actual conversion code is simple:

std::wstring in(...);
std::string out = boost::locale::conv::utf_to_utf<char, wchar_t>(in);

However the issue is that the UTF16 is read from a file and it may or may not contain BOM. My code needs to be portable (minimum is windows/osx/linux). I'm really struggling to figure out how to create a wstring from the byte sequence.

EDIT: this is not a duplicate of the linked question, as in that question the OP needs to convert a wide string into an array of bytes - and I need to convert the other way around.

Was it helpful?

Solution

You should not use wide types at all in your case.

Assuming you can get a char * from your vector<char>, you can stick to bytes by using the following code:

char * utf16_buffer = &my_vector_of_chars[0];
char * buffer_end = &my_vector_of_chars[vector.size()];
std::string utf8_str = boost::locale::conv::between(utf16_buffer, buffer_end, "UTF-8", "UTF-16");

between operates on 8-bit characters and allows you to avoid conversion to 16-bit characters altogether.

It is necessary to use the between overload that uses the pointer to the buffer's end, because by default, between will stop at the first '\0' character in the string, which will be almost immediately because the input is UTF-16.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top