Question

I have had no end of problems trying to do what I thought would be relatively simple:

I need to have a form which can accept user input text in a mix of English an other languages, some multi-byte (ie Japanese, Korean, etc), and this gets processed by php and is stored (safely, avoiding SQL injection) in a mysql database. It also needs to be accessed from the database, processed, and used on-screen.

I have it set up fine for Latin chars but when I add a mix of Latin andmulti-byte chars it turns garbled.

I have tried to do my homework but just am banging my head against a wall now.

Magic quotes is off, I have tried using utf8_encode/decode, htmlentities, addslashes/stripslashes, and (in mysql) both "utf8_general_ci" and "utf8_unicode_ci" for the field in the table.

Part of the problem is that there are so many places where I could be messing it up that I'm not sure where to begin solving the problem.

Thanks very much for any and all help with this. Ideally, if someone has working php code examples and/or knows the right mysql table format, that would be fantastic.

Was it helpful?

Solution

I don't think you have any practical alternatives to UTF-8. You're going to have to track down where the encoding and/or decoding breaks. Start by checking whether you can round-trip multi-language text to the data base from the mysql command line, or perhaps through phpmyadmin. Track down and eliminate problems at that level. Then move out one more level by simulating input to your php and examining the output, again dealing with any problems. Finally add browsers into the mix.

OTHER TIPS

Here is a laundry list of things to check are in UTF8 mode:

  • MySQL table encoding. You seem to have already done this.
  • MySQL connection encoding. Do SHOW STATUS LIKE 'char%' and you will see what MySQL is using. You need character_set_client, character_set_connection and character_set_results set to utf8 which can easily set in your application by doing SET NAMES 'utf8' at the start of all connections. This is the one most people forget to check, IME.
  • If you use them, your CLI and terminal settings. In bash, this means LANG=(something).UTF-8.
  • Your source code (this is not usually a problem unless you have UTF8 constant text).
  • The page encoding. You seem to have this one right, too, but your browsers debug tools can help a lot.

Once you get all this right, all you will need in your app is mysql_real_escape_string().

Oh and it is (sadly) possible to successfully store correctly encoded UTf8 text in a column with the wrong encoding type or from a connection with the wrong encoding type. And it can come back "correctly", too. Until you fix all the bits that aren't UTF8, at which point it breaks.

First you need to check if you can add multi-language text to your database directly. If its possible you can do it in your application

Are you serializing any data by chance? PHPs serialize function has some issue when serializing non-english characters.

Everything you do should be utf-8 encoded.

One thing you could try is to json_encode() the data when putting it into the database and json_decoding() it when it's retrieved.

The problem was caused by my not having the default char set in the php.ini file, and (possibly) not having set the char set in the mysql table (in PhpMyAdmin, via the Operations tab).

Setting the default char set to "utf-8" fixed it. Thanks for the help!!

Check your database connection settings. It also needs to support UTF-8.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top