Question

It seems that on 64-bit platforms it would be reasonable to have a 8-byte length prefix. If we can address more than 4Gb of mem why not allow, say, 5Gb strings? Is the answer just "by specification" or there is some interoperability/backwards compatibility reasons that I'm not aware of? Thanks.

Was it helpful?

Solution

The BSTR data type is the standard COM string data type. Changing the length prefix would make it impossible to safely move strings between processes of different bitness (or at least make it significantly more complex). Since COM is the only relevant cross-bitness interop infrastructure it is necessary to have BSTRs behave the same way for 32-bit processes and 64-bit processes.

It is a tradeoff, imposing a 'limit' of 2GB in exchange for hassle-free marshaling of strings between processes of different bitness.

OTHER TIPS

One good reason is for compatibility with platform APIs like MultiByteToWideChar which accepts int lengths. There are many more string APIs that work with 32 bit lengths.

It's not actually a real limitation because I cannot conceive of a scenario where a BSTR of length >2GB would be the best solution to a problem.

BSTR is a length-prefixed string, so the first property is length, not address. Therefore it need not be the same size as the pointer and can just be the size that's enough for the application

For all practical purposes, 4GB is more than enough for a string, and keeping the maximum string size the same allows you to pass strings between processes without problem. For example if the length is a 64-bit type on 64-bit Windows then what will happen when you pass an 8GB string from a 64-bit process to a 32-bit process? Should the string be truncated or should an error be reported? The same prefix size may also improve backward compatibility

The range of applications where it's useful to have a large number of objects whose total size exceeds 2GiB is much greater than the range of applications where it's useful to either have any individual objects exceed 2GiB. Even if individual operations on 64-bit values are no more expensive than operations on 32-bit values, the number of 32-bit values that will fit in each level of cache will be twice the number of 64-bit values likewise. Thus, absent a good reason to use 64-bit values to hold object sizes, having a platform limit individual objects to 2GiB is a perfectly reasonable design decision, especially since code which isn't designed to work with larger objects will often malfunction in ways that would be prone to creating security vulnerabilities if run on systems that don't reject attempts to create objects greater than 2GiB.

The most important reason is probably so that BSTR can continue to travel in VARIANT. You will notice from the definition of tagVARIANT in oaidl.h that the bstrVal member appears to be part of the union of other types, but where is its length stored? The answer is in the wReserved2/wReserved3 members of the VARIANT structure that immediately precede the bstrVal member in memory. There are 3 reserved words there so in theory BSTR's length could be expanded to 6 bytes, but if it got any bigger it would overwrite the VARTYPE member and VARIANT would no longer work. So BSTR is length-limited even on 64-bit platforms so that it can continue to travel in VARIANT.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top