Short version: yes, it is safe to do string calculations in WinDbg using String.Length
and it is safe to use du
to dump them.
UTF-16 4 byte characters ending on 00 00
The unicode specification defines that the first 6 bits of byte 1 are 110110 and the first 6 bits of byte 3 are 110111. This means that the first nibble (4 bits) is always a D
so that a 4 byte UTF-16 character always looks like this: D? ?? D? ??
and will never end with 00 00
.
Therefore it is safe to use du
commands on UTF-16 strings.
Using string.Length for calculating the range
Before answering my own question, I wanted to try the behavior in C# and therefore asked the question about how to create 4-byte characters in C#.
Unexpectedly, this already pointed me to the answer: string.Length is the string length in code units, not characters. To get the Unicode character length, we should use the System.Globalization.StringInfo
class.