Are BSTR UTF-16 Encoded?

https://stackoverflow.com/questions/4055299

27-09-2019
|

Question

I'm in the process of trying to learn Unicode? For me the most difficult part is the Encoding. Can BSTRs (Basic String) content code points U+10000 or higher? If no, then what's the encoding for BSTRs?

Solution

In Microsoft-speak, Unicode is generally synonymous with UTF-16 (little endian if memory serves). In the case of BSTR, the answer seems to be it depends:

On Microsoft Windows, consists of a string of Unicode characters (wide or double-byte characters).

On Apple Power Macintosh, consists of a single-byte string.

May contain multiple embedded null characters.

So, on Windows, yes, it can contain characters outside the basic multilingual plane but these will require two 'wide' chars to store.

OTHER TIPS

BSTR's on Windows originally contained UCS-2, but can in principle contain the entire unicode set, using surrogate pairs. UTF-16 support is actually up to the API that receives the string - the BSTR has no say how it gets treated. Most API's support UTF-16 by now. (Michael Kaplan sorts out the details.)

The windows headers still contain another definition for BSTR, it's basically

#if defined(_WIN32) && !defined(OLE2ANSI)
   typedef wchar_t OLECHAR;
#else
   typedef char OLECHAR;
#endif
typedef OLECHAR * BSTR;

There's no real reason to consider the char, however, unless you desperately want to be compatible with whatever this was for. (IIRC it was active - or could be activated - for early MFC builds, and might even have been used in Office for Mac or something like that.)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow