Domanda

Something that made me curious - supposedly the default character encoding in HTML5 is UTF-8. However if I have a plain simple HTML file with an HTML5 doctype like the code below, I get:

"hello" in Russian: "ЗдраÑтвуйте"

In Chrome 33+, Safari 6, IE11, etc.

<!DOCTYPE html>

<html>

<head></head>

<body>
    <p>"hello" in Russian is "здраствуйте"</p>
</body>

</html>

What gives? Shouldn't the browser utilize the UTF-8 unicode standard and display the text correctly? I'm using Coda which is set to save html files with UTF-8 encoding by default so that's not the problem.

È stato utile?

Soluzione

The text data in the example is UTF-8 encoded text misinterpreted as window-1252 encoded. The reason is that the encoding has not been specified and browsers are forced to make a guess. To fix this, specify the encoding; see the W3C page Character encodings. Two simple ways that work independently of server settings, as long as the server does not send wrong encoding information in HTTP headers:

1) Save the file as UTF-8 with BOM (there is probably an option for this in your authoring program.

2) Add the following tag into the head part:

<meta charset=utf-8>

There is no single default encoding specified for HTML5. On the contrary, browsers are expected to make guesses when no encoding has been declared. This is a fairly complex process, described in 8.2.2.2 Determining the character encoding.

Altri suggerimenti

If you want to be sure which charset will be used by browser you must have in your page head

 <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

otherwise you are at the mercy of local settings and browser automation.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top