Can I use the _bstr_t class to convert between multibyte and Unicode on Windows? (C++)

https://stackoverflow.com/questions/14327776

15-01-2022
|

Question

BSTR is this weird Windows data type with a few specific uses, such as COM functions. According to MSDN, it contains a WCHAR string and some other stuff, like a length descriptor. Windows is also nice enough to give us the _bstr_t class, which encapsulates BSTR; it takes care of the allocation and deallocation and gives you some extra functionality. It has four constructors, including one that takes in a char* and one that takes in a wchar_t*. MSDN's description of the former: "Constructs a _bstr_t object by calling SysAllocString to create a new BSTR object and then encapsulates it. This constructor first performs a multibyte to Unicode conversion."

It also has operators that can extract a pointer to the string as any of char*, const char*, and wchar_t*, and I'm pretty sure those are null-terminated, which is cool.

I've spent a while reading up on how to to convert between multibyte and Unicode, and I've seen a lot of talk about how to use mbstowcs and wcstomb, and how MultiByteToWideChar and WideCharToMultiByte are better because of encodings may differ, and blah blah blah. It all kind of seems like a headache, so I'm wondering whether I can just construct a _bstr_t and use the operations to access the strings, which would be... a lot fewer lines of code:

char* multi = "asdf";
_bstr_t bs = _bstr_t(mb);
wchar_t* wide = (wchar_t*)bs; // assume read-only

I guess my intuitive answer to this is that we don't know what Windows is doing behind the scenes, so if I have a problem using mbstowcs/wcstomb (I guess I really mean mbstowcs_s/wcstomb_s) rather than MultiByteToWideChar/WideCharToMultiByte, I shouldn't risk it because it's possible that Windows uses those. (It's almost certainly not using the latter, since I'm not specifying a "code page" here, whatever that is.) Honestly I'm not sure yet whether I consider the mbstowcs_s and wcstomb_s functions OK for my purposes, because I don't really have a grasp on all of the different encodings and stuff, but that's a whole different question and it seems to be addressed all over the Internet.

Sooooo, is there anything wrong with doing this, aside from that potential concern?

Solution

Using _bstr_t::_bstr_t(const char*) is not exactly a good idea in production code:

Constructs a _bstr_t object by calling SysAllocString to create a new BSTR object and encapsulate it. This constructor first performs a multibyte to Unicode conversion. If s2 is too large, you [sic] may generate a stack overflow error. In such a situation, convert your char* to a wchar_t with MultiByteToWideChar and then call the wchar_t * constructor.

Besides that _bstr_t::operator wchar_t*() const throw() seems barely useful. It's just for struct member extraction, so you're constrained to a const:

These operators can be used to extract raw pointers to the encapsulated Unicode or multibyte BSTR object. The operators return the pointer to the actual internal buffer, so the resulting string cannot be modified.

So _bstr_t is just a helper object for encapsulating BSTRs, and a mediocre one at that. Conversion using MultiByteToWideChar and WideCharToMultiByte is a much better choice, for multiple reasons:

It's much less prone to crash.
You don't get a const buffer in return, because you provide your own.
The names of those functions are self-descriptive. Conversion through a constructor and casting operator of an unrelated type is not.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow