Question

What should be used and when ? or is it always better to use UTF-8 always? or ISO-8859-1 still has importance in specific conditions?

Is Character-set related to geographic region?


Edit:

Is there any benefit to put this code @charset "utf-8";

or like this <link type="text/css; charset=utf-8" rel="stylesheet" href=".." />

at the top of CSS file?

I found for this

If DreamWeaver adds the tag when you add embedded style to the document, that is a bug in DreamWeaver. From the W3C FAQ:

"For style declarations embedded in a document, @charset rules are not needed and must not be used."

The charset specification is a part of CSS since version 2.0 (may 1998), so if you have a charset specification in a CSS file and Safari can't handle it, that's a bug in Safari.

and add accept-charset in form

<form action="/action" method="post" accept-charset="utf-8">

and what should be use if i use xhtml doctype

<?xml version="1.0" encoding="UTF-8"?>

or

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
Was it helpful?

Solution

Unicode is taking over and has already surpassed all others. I suggest you hop on the train right now.

Note that there are several flavors of unicode. Joel Spolsky gives an overview.

Unicode is winning (Graph current as of Feb. 2012, see comment below for more exact values.)

OTHER TIPS

UTF-8 is supported everywhere on the web. Only in specific applications is it not. You should always use utf-8 if you can.

The downside is that for languages such as chinese, utf-8 takes more space than, say, utf-16. But if you don't plan on going chinese, or even if you do go chinese then utf-8 is fine.

The only cons against using utf-8 is that it takes more space compared to various encodings, but compared to western languages it takes almost no extra space at all, except for very special characters, and those extra bytes you can live with. We are in 2009 after all. ;)

If you want world domination, use UTF-8 all the way, because this covers every human character available at the world, including Asian, Cyrillic, Hebrew, Arabic, Greek and so on, while ISO-8859 is only restricted to Latin characters. You don't want to have Mojibake.

I find iso-8859-1 very useful on a couple of sites where I have clients sending me text files that were created in Word or Publisher, that I can easily insert into the midst of PHP code and not worry about it - especially where quotes are concerned. These are local, U.S. companies, there is literally no other difference in the text on the pages, and I see no disadvantage in using that character set on those particular pages. All others are UTF-8.

  • ISO-8859-1 is a great encoding to use when space is a premium and you are only ever going to want to encode characters from the basic latin languages it supports. And you are never ever ever going to ever have to ever contemplate ever upgrading your application to support non latin languages.

  • utf8 is a fantastic way to (a) use the large code base of 8bits per character code libraries there are that already exist, or (b) be a euro snob. utf8 encodes standard ascii in 1 byte per character, latin 1 in 2 bytes per character, eastern european and asian languages get 3 bytes per character. It possibly goes up to 4 bytes per character if you start trying to encode ancient languages that dont exist in the basic multilingual plane.

  • utf16 is a great way to start a new codebase from scratch. Its completely culture neutral - everone gets a fair handed 2 bytes per character. It does need 4 bytes per character for ancient/exotic languages - which means - in the worst case - its as bad as its big brother:

  • utf32 is a waste of space.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top