Byte-Order Mark found in UTF-8 File. W3C Validation Error
-
15-06-2021 - |
Question
I have created a web site which is valid to strict XHTML and passes the validation, but the W3C validator tells me I have a note (error):
Byte-Order Mark found in UTF-8 File.
The Unicode Byte-Order Mark (BOM) in UTF-8 encoded files is known to cause problems for some text editors and older browsers. You may want to consider avoiding its use until it is better supported.
But I have no BOM in my file. It's straight XHTML done in VS.
Is the server adding it? How can I get rid of the error?
This is important as it screws up semantic extraction. http://www.w3.org/2003/12/semantic-extractor.html
Solution
The W3C Markup Validator does not indicate a BOM in UTF-8 as an error; it would itself be in error if it did, since a BOM is allowed at the start of UTF-8 data. It issues a warning.
The warning is seriously outdated. No problems have been observed in relevant browsers for many years. On the contrary, BOM should be regarded as useful, since if e.g. a file is saved locally (and HTTP headers are thus lost, the BOM in UTF-8 format lets browsers to infer, with practical certainty, that the document is UTF-8 encoded.
The Semantic data extraction tool is not very up-to-date, and it suffers from a too theoretic approach, but it does not seem to have any problem with BOM at the start of UTF-8 data.
It is possible that the server adds the BOM, or that your authoring tool adds it. Either way, it should be considered as useful, rather than a problem.
OTHER TIPS
You do have a BOM (EF BB BF) in your resource. Consider removing it, perhaps, using some hex editor. How do I remove the BOM character from my xml file