Question

From the W3C:

If an HTML document does not start with a BOM, and its encoding is not explicitly given by Content-Type metadata, and the document is not an iframe srcdoc document, then the character encoding used must be an ASCII-compatible character encoding

So How I can add a BOM which would mean the document is encoded in UTF16 for example?

Was it helpful?

Solution 3

The byte order mask is an hex sequence which can be put at the beginning of any file.
It has nothing to do with the html/other web languages.

An hex editor is a good way to add it.

Although UTF-32 offer the advantage of fixed length encoding, some browser/e-mail client dropped the support for it.

note: UTF-16 is mainly used on windows.

OTHER TIPS

You add a BOM by inserting U+FEFF (which is what the BOM is by definition) at the very start of the data. How you do that depends on how you are generating UTF-16 or UTF-32 in the first place.

The “rephrased” question “how I can display an utf-16/utf-32 encoded html document?” is really a different, and the short answer is: mostly, you don’t. There is hardly any reason to use utf-16 or utf-32 for an HTML document. The recommendations clearly favor utf-8. But if you use utf-16 or utf-32, then you should primarily take care of Content-Type header, and additionally include a BOM.

The hint is here:

its encoding is not explicitly given by Content-Type metadata

You should try that (by HTTP headers or by etc.) For inserting the BOM, your code editor should be able to do that.

Please also see the W3C specs:

Most of the time you are probably better off choosing UTF-8 as your encoding. [...] One reason for this is that there are special rules for declaring the encoding of a UTF-16 page.

Whether you use element-based declarations or not, you should ensure that you always have a byte-order mark at the very start of a UTF-16 encoded file. In effect, this is the in-document declaration.

Furthermore, if your page is encoded as UTF-16, do not declare your file to be "UTF-16BE" or "UTF-16LE", use "UTF-16" only. The byte-order mark at the beginning of your file will indicate whether the encoding scheme is little-endian or big-endian. (This is because content explicitly encoded as, say, UTF-16BE should not use a byte-order mark; but HTML5 requires a byte-order mark for UTF-16 encoded pages.)

http://www.w3.org/International/questions/qa-html-encoding-declarations#utf16

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top