My understanding of Delphi's UnicodeString type is that it's UTF-16 internally.
You are correct about UTF-16 encoding of Delphi's UnicodeString
. This means what one 16-bit character is wide enough to represent all code points from the Basic Multilingual Plane as exactly one Char
element of string
array.
But my general understanding of Unicode is that not all unicode characters can be represented even in 2 bytes, that some corner case foreign characters will take 4 bytes.
However, you've got a little misconception here. Length
function does not perform any deep inspection of characters and simply returns number of 16-bit WideChar
elements, without taking into account any surrogates within your string. This means what if you assign a single character from any of Supplementary Planes to the UnicodeString
, Length
will return 2.
program Egyptian;
{$APPTYPE CONSOLE}
var
S: UnicodeString;
begin
S := #$1304E; // single char
Writeln(Length(S));
Readln;
end.
Conclusion: byte size of string data is always fixed and equals Length(S) * SizeOf(Char)
, no matter if S
contains any variable-length characters.