Question

I'm brazilian and work / live here. For those who don't know, Portuguese has lots of words with accented letters. And I also know that, if you don't work with the charset correctly, all your accented letters become garbage upon rendering to the browser.

So, in my daily work, I always see in my boss' code (which isn't the most carefully-well-writen kind) the ampersand pattern (if this has a name, please let me know). So, for example, these are all over the place:

Formulário
Relatório
Exclusão

I, knowing you can set the charset of a web page both on the server and the client, have been doing this in my web pages (BTW we've been working with ASP.NET WebForms)...

<%@ Page (...) ContentType="text/html; charset=utf-8" %>

and then, in the <head>:

<meta charset="utf-8" />

But my boss saw this and said this was a bad practice. I googled a bit and found no resources saying it is, in fact, a bad practice. And there have been some times when my boss said a good practice was a bad one. He then told me to replace all my accented letters with their ampersand counterparts. If it really is a better practice, I'll do it.

So, TL;DR: Is it better to set the web page's Content Type or to use the "ampersand pattern"?

Was it helpful?

Solution

The W3C has an article on character encoding that is quite useful. Their take on this is pretty much:

You should always specify the encoding used for an HTML or XML page. If you don't, you risk that characters in your content are incorrectly interpreted. This is not just an issue of human readability, increasingly machines need to understand your data too.

Further, according to the MDN article on the <meta> element, it is good practice to specify the charset, since it protect your users against certain cross-scripting attacks:

It is good practice, and strongly recommended, to define the character set using this attribute. If no character set is defined for a page, several cross-scripting techniques may become practical to harm the page user, like the UTF-7 fallback cross-scripting technique. Always setting this meta will protect against these risks.

Even though there might be rare cases where it isn't possible to specify the content type, the general opinion appear to be that it is good practice to specify the content type. And if you do so properly, then there isn't much need for using the HTML-representation of special characters (which in my opinion make your code much harder to read).

OTHER TIPS

"better" is somewhat subjective. There are pros and cons to both approaches.

Using a sensible character encoding:

  • Gives you more readable code
  • Gives you smaller code

Using character references:

  • Means you don't have to care about the encoding
  • Allows the page to be copied somewhere with wrong HTTP headers and still work
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top