Question

The snippet says it all :-)

UTF8Encoding enc = new UTF8Encoding(true/*include Byte Order Mark*/);
byte[] data = enc.GetBytes("a");
// data has length 1.
// I expected the BOM to be included. What's up?
Was it helpful?

Solution

You wouldn't want it to be used for every call to GetBytes, otherwise you'd have no way of (say) writing a file a line at a time.

By exposing it with GetPreamble, callers can insert the preamble just at the appropriate point (i.e. at the start of their data). I agree that the documentation could be a lot clearer though.

OTHER TIPS

Thank you both. The following works, and LINQ makes the combination simple :-)

UTF8Encoding enc = new UTF8Encoding(true);
byte[] data = enc.GetBytes("a");
byte[] combo = enc.GetPreamble().Concat(data).ToArray();

Because it is expected that GetBytes() will be called lots of times... you need to use:

byte[] preamble = enc.GetPreamble();

(only call it at the start of a sequence) and write that; this is where the BOM lives.

Note that in general, you don't need the Byte Order Mark for UTF-8 anyway. It's main purpose is to tell UTF16 BE and UTF16 LE apart. There is no such thing as UTF8 LE and UTF8 BE.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top