Question

I have a variant bstr that was pulled from MSXML DOM, so it is in UTF-16. I'm trying to figure out what default encoding occurs with this conversion:

VARIANT vtNodeValue;
pNode->get_nodeValue(&vtNodeValue);
string strValue = (char*)_bstr_t(vtNodeValue);

From testing, I believe that the default encoding is either Windows-1252 or Ascii, but am not sure.

Btw, this is the chunk of code that I am fixing and converting the variant to a wstring and going to a multi-byte encoding with a call to WideCharToMultiByte.

Thanks!

Was it helpful?

Solution

The operator char* method calls _com_util::ConvertBSTRToString(). The documentation is pretty unhelpful, but I assume it uses the current locale settings to do the conversion.

Update:

Internally, _com_util::ConvertBSTRToString() calls WideCharToMultiByte, passing zero for all the code-page and default character parameters. This is the same as passing CP_ACP, which means to use the system's current ANSI code-page setting (not the current thread setting).

If you want to avoid losing data, you should probably call WideCharToMultiByte directly and use CP_UTF8. You can still treat the string as a null-terminated single-byte string and use std::string, you just can't treat bytes as characters.

OTHER TIPS

std::string by itself doesn't specify/contain any encoding. It is merely a sequence of bytes. The same holds for std::wstring, which is merely a sequence of wchar_ts (double-byte words, on Win32).

By converting _bstr_t to a char* through its operator char*, you'll simply get a pointer to the raw data. According to MSDN, this data consists of wide characters, that is, wchar_ts, which represent UTF-16.

I'm surprised that it actually works to construct a std::string from this; you should not get past the first zero byte (which occurs soon, if your original string is English).

But since wstring is a string of wchar_t, you should be able to construct one directly from the _bstr_t, as follows:

_bstr_t tmp(vtNodeValue);
wstring strValue((wchar_t*)tmp, tmp.length());

(I'm not sure about length; is it the number of bytes or the number of characters?) Then, you'll have a wstring that's encoded in UTF-16 on which you can call WideCharToMultiByte.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top