confused by html5, utf-8 and 8859-1

Question 1

The character ² ("SUPERSCRIPT TWO") is represented by the number 0xb2 (178 decimal) -- but it's represented differently in 8859-1 and UTF-8.

In 8859-1, it's represented as a single byte with the value 0xb2.

In UTF-8, it's represented as two consecutive bytes with the values 0xc2, 0xb2. See here for an explanation of the encoding.

(8859-1 is more compact that UTF-8 for files containing 8-bit characters, but it's incapable of representing anything past 255. UTF-8 is compatible with ASCII and with 8859-1 for 7-bit characters, is reasonably compact for most text, and can represent more than a million distinct characters.)

A file containing only 7-bit characters can be interpreted either as ASCII, 8859-1, or UTF-8. A file containing 8-bit characters cannot; it has to be translated.

If you're on a Unix-like system with the iconv command installed, this:

iconv -f iso-8859-1 -t utf-8

will perform the appropriate translation.

Question 2

Actually, only the first 128 code points are encoded in UTF-8 as ASCII, but UTF-8 is not ASCII, in particular, the next 128 code points differ.

You need to re-save the files as UTF-8 if you want them to be served as UTF-8.