Question

I know that BOM is used for UTF-8 files, but what about the text files where every character is 2-bytes, should I add the byte order mark to them, too?

Was it helpful?

Solution

BOM's were invented for UCS-2 and UTF-16, and then only later appropriated by Microsoft (and then XML) for UTF-8. Think about the name: 'byte order mark'. UTF-8 has only one possible byte order, so it doesn't need a BOM to reveal the order. The three-byte sequence for U+FEFF in UTF-8 has, instead, become a Unicode signature for file type sniffing.

However, early versions of the XML support in Java did not respond well to a UTF-8 BOM, in spite of the inclusion of the UTF-8 BOM in the XML standard. Further, a file with a BOM can't be simply concatenated onto another file, because U+FEFF isn't BOM in the middle of the file; it's ZWNBSP.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top