Question

How can I know what encoding will be used by PHP when sending data to the browser? I.e. with the Cotent-Type header, for instance: iso-8859-1.

Was it helpful?

Solution

Usually Apache + PHP servers of webhosters are configured to send out NO charset header. The shortest way to test how your server is configured are these:

  • Use this tool to see the server header by getting any one of your pages on your webiste. If in the server headers you see a charset it means your server is using it, usually it won't contain a charset.
  • Another way is to run this simple script on your server: <?php echo ini_get('default_charset'); ?> As said above this usually prints out an empty string, if different it will show you the charset of the PHP.

The 2nd solution is supposing Apache is not configured with AddDefaultCharset some_charset which is not usually the case, but in such case I'm afraid Apache setting might override PHP deafult_charset ini directive.

OTHER TIPS

You can use the header() solution that William suggested, however if you are running Apache, and the Apache config is using a default charset, that will win everytime (Internet Explorer will go crazy) See: AddDefaultCharset

Keep in mind that content-types and encodings are two different things. text/html is a content-type; ISO-8859-1 and UTF-8 are encodings.

The HTTP response header that the server sends typically looks like this:

Content-Type: text/html; charset=utf-8

"charset" is actually the character encoding. It's not in a separate header; however there is a header called "Content-Encoding" which actually specifies what kind of compression the response uses (e.g. gzip).

If you want to change the character encoding to UTF-8, in a file that contains HTML:

<?
header("Content-Type: text/html; charset=utf-8");

You can set your own with header('Content-type: xxx/yyy');, but I believe that text/html is sent by default.

AFAIK, PHP sends strings bytewise. that is, if your variables hold UTF-8, it will send UTF-8. if you have iso-8859-1, it will send that too. if you mix them, it won't be pretty.

If your server is not configured to have a default content or charset, and neither is PHP, PHP will send only Content-Type: text/html - it won't specify a charset at all, and will send the bytes as it sees them in the script.

If a browser receives a page without charset specified, various things can happen:

  • most browsers have an "Encoding/Charset" menu; if the user explicitly selects one, the browser will try to apply it. Doesn't happen too often, so:
  • some browsers try to render it with a default charset (which is locale-dependent, e.g. for FF and cs_CZ it used to be iso-8859-2; YMMV)
  • IE will try to determine the charset heuristically (it will take a guess, based on character distribution - and many times it gets it right; sometimes it gets it wrong and you get a page in Romanian interpreted as Chinese text, which usually means "unreadable")
  • some old browsers will fall back on us-ascii

If with this procedure, the PHP script's charset and the browser's charset matches, the text will - accidentally - be readable. If not, there will be weird signs and similar phenomena.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top