Question

I implemented this form submission method that uses xmlhttpreqeust. I saw the new html5 feature, FormData, that allows submission of files along with forms. Cool! However, there's a problem with accented characters, specifically those stupid smart quotes that Word makes (yes, I'm a little bias against those characters). I used to have it submit to a hidden iframe, the old school way, and I never had a problem with the variety of weird characters that was put in there. But I thought this would be better. It's turning out to be a bigger headache :-/

Let's look at the code. My javascript function (note the commented out line):

var xhr = new XMLHttpRequest();
var fd = new FormData(form);

xhr.addEventListener("error", uploadFailed, false);
xhr.addEventListener("abort", uploadCanceled, false);
xhr.addEventListener("load", uploadComplete, false); 

xhr.open($(form).attr('method'), $(form).attr('action'));

//xhr.setRequestHeader("Content-Type", "application/x-www-form-urlencoded; charset=ISO-8859-1");
xhr.send(fd);

This is a shortened view, check out line 1510 at http://archive.cyark.org/sitemanager/sitemanager.js to view the entire function.

Then on the receiving php page, I have at the top:

header('Content-Type: text/html; charset=ISO-8859-1'); 

Followed by some basic php to build a string with the post data and submit it as an update to mysql.

So what do I do? If I uncomment the content-type setting in javascript it totally breaks the POST data in my php script. I don't know if the problem is in javascript, php or mysql. Any thoughts?

Was it helpful?

Solution

Encoding problems are sometimes hard to debug. In short the best solution is to literally use UTF8 as encoding everywhere. That is, every component of your application stack.

Your page seems to be delivered as ISO-LATIN-1 (sent via HTTP header from your webserver) which leads browsers to use latin1 or some Windows equivalent like windows-1252 even though you may have META elements in your HTML's HEAD telling user agents to use UTF8. The HTTP header takes precedence. Check the delivery of your other file formats (especially .js) to be UTF8 as well. If your problems are still appearing after configuring everything client side related (HTML, JS, XHR etc.) to use UTF8 you will have to start checking your server side for problems.

This may include such simple problems as PHP files not being proper UTF8 (very unlikely on linux servers I'd say) but usually consists of problems with mysql configurations (server and client), database and table default encoding (and collation) and the correct connection settings. Problems may also be caused by incorrect PHP ini or mbstring configuration settings.

Examples (not complete; using mysql here as a common database example):

MySQL configuration

[mysqld]
default_character_set = utf8
character_set_client = utf8
character_set_server  = utf8
[client]
default_character_set = utf8

Please note, that those settings are different for mysql version 5.1 and 5.5 and may prevent the mysqld from starting when using the wrong variable. See http://dev.mysql.com/doc/refman//5.5/en/server-system-variables.html and http://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html for details.

You may check your mysql variables via CLI:

mysql> SHOW VARIABLES LIKE '%char%';
Variable_name Value
character_set_client utf8
character_set_connection utf8
character_set_database utf8
character_set_filesystem binary
character_set_results utf8
character_set_server utf8
character_set_system utf8

When creating databases and tables try to use something like

CREATE DATABASE $db /*!40100 DEFAULT CHARACTER SET utf8 */

PHP.ini settings (should be the default already):

default_charset = "utf-8"

MB-String extension of PHP uses latin1 by default and should be reconfigured if used:

[mbstring]
mbstring.internal_encoding = UTF-8
mbstring.http_output = UTF-8
...some more perhaps...

Webserver settings (Apache used as example, applies to other servers as well):

# httpd.conf
AddDefaultCharset UTF-8

PHP source codes may use header settings like:

header('Content-type: text/html; charset=UTF-8');

Shell (bash) settings:

# ~/.profile
export LC_CTYPE=en_US.UTF-8
export LANG=en_US.UF-8

The above list is presented here just to give you a hint on what pitfalls may wait for you in certain situations. Every single component of your used web stack must be able to use UTF8 and should be configured correctly to do so. Nonetheless usually a simple correct HTTP header of UTF8 is enough to sort most problems out though. Good luck! :-)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top