First you should learn something about text data and how it's represented. A reference that will get you started there is The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
byte
is just a typedef or something for char
or unsigned char
. So the byte array is using some char
encoding for the string. You need to actually convert from that encoding, whatever it is, into UTF-16 for Windows' wchar_t
. Here's the typical method recommended for doing such conversions on Windows:
int output_size = MultiByteToWideChar(CP_ACP,0,value,-1,NULL,0);
assert(0<output_size);
wchar_t *converted_buf = new wchar_t[output_size];
int size = MultiByteToWideChar(CP_ACP,0,value,-1,converted_buf,output_size);
assert(output_size==size);
We call the function MultiByteToWideChar()
twice, once to figure out how large of a buffer is needed to hold the result of the conversion, and a second time, passing in the buffer we allocated, to do the actual conversion.
CP_ACP
specifies the source encoding, and you'll need to check the API documentation to figure out what that value really should be. CP_ACP
stands for 'codepage: Ansi codepage', which is Microsoft's way of saying 'the encoding set for "non-Unicode" programs.' The API may be using something else, like CP_UTF8
(we can hope) or 1252 or something.
You can view the rest of the documentation on MultiByteToWideChar here to figure out the other arguments.
Once we execute the line where we start allocating the memory, our image variable gets filled with a bunch of unwanted Unicode characters:
When you call malloc()
the memory given to you is uninitialized and just contains garbage. The values you see before initializing it don't matter and you simply shouldn't use that data. The only data that matters is what you fill the buffer with. The MultiByteToWideChar()
code above will also automatically null terminate the string so you won't see garbage in unused buffer space (and the method we use of allocating the buffer will not leave any extra space).
The above code is not actually very good C++ style. It's just typical usage of the C-style API provided by Win32. The way I prefer to do conversions (if I'm forced to) is more like:
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>,wchar_t> convert; // converter object saved somewhere
std::wstring output = convert.from_bytes(value);
(Assuming the char
encoding being used is UTF-8. You'll have to use a different codecvt
facet for any other encoding.)