Yesterday I upgraded an html page from "4.01 strict" to html5.
* http://r0k.us/rock/games/CoH/HallsOfHeroes/
The character encoding is iso-8859-1. The http://validator.w3.org fails and won't even parse it when utf-8 is specified as charset, apparently because I use footnote characters such as ² . They are in the upper 128 bytes of the character set. What confuses me is that I keep reading that the first 256 bytes of utf-8 is 8859-1.

Does anyone know why the page won't validate as utf-8 ?

有帮助吗?

解决方案 2

The character ² ("SUPERSCRIPT TWO") is represented by the number 0xb2 (178 decimal) -- but it's represented differently in 8859-1 and UTF-8.

In 8859-1, it's represented as a single byte with the value 0xb2.

In UTF-8, it's represented as two consecutive bytes with the values 0xc2, 0xb2. See here for an explanation of the encoding.

(8859-1 is more compact that UTF-8 for files containing 8-bit characters, but it's incapable of representing anything past 255. UTF-8 is compatible with ASCII and with 8859-1 for 7-bit characters, is reasonably compact for most text, and can represent more than a million distinct characters.)

A file containing only 7-bit characters can be interpreted either as ASCII, 8859-1, or UTF-8. A file containing 8-bit characters cannot; it has to be translated.

If you're on a Unix-like system with the iconv command installed, this:

iconv -f iso-8859-1 -t utf-8

will perform the appropriate translation.

其他提示

Actually, only the first 128 code points are encoded in UTF-8 as ASCII, but UTF-8 is not ASCII, in particular, the next 128 code points differ.

You need to re-save the files as UTF-8 if you want them to be served as UTF-8.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top