I have a PHP website with the CLEditor richtext control on it. When I try to write Euros and British Pounds to the database, the character goes through just fine because I have the charset set to UTF-8 in the containing page HTML, in the richtext control IFRAME HTML, and in the MySQL table collation. All is well on that front. However, when I try to write smart quotes, I end up seeing this output in the database:

This is a “testâ€.

(If that doesn't show up properly above in you browser, the test word has something like a Latin a, a Euro symbol, and the small AE symbol in front of the word, and a Latin a and a Euro symbol after it.)

When I use PHP to read that value back out of the database to display it on the page, it ends up as black diamonds with question marks on them as well as some other Latin characters.

What should I be doing to fix this?

有帮助吗?

解决方案

First, make sure your MySQL table is using UTF-8 as its encoding. If it is, it will look like this:

mysql> SHOW CREATE TABLE Users (
...
) ENGINE=InnoDB AUTO_INCREMENT=30 DEFAULT CHARSET=utf8 |

Next, make sure your HTML page is set to display UTF-8:

<html>
    <head>
        <meta http-equiv="content-type" content="text/html;charset=UTF-8" />
    </head>
    ....
</html>

Then it should work.


EDIT: I purposefully did not talk about collation, because I thought it was already considered, but for the benefit of everyone, let me add some more to this answer.

You state,

I have the charset set to UTF-8 … in the MySQL table collation.

Table collation is not the same thing as charset.

Collation is the act of automagically trying to convert one charset to another FOR THE PURPOSES OF QUERYING. E.g., if you have a charset of latin1 and a collation of UTF-8, and you do something like SELECT * FROM foo WHERE bar LIKE '%—%'; (UTF-8 U+2014) on a table with a charset of latin1 that match either L+0151 or U+2014.

Not so coincidentally... if you were output this latin1 encoded character onto a UTF-8 encoded web page, you will get the following:

This is a “testâ€.

That seems to be the output of your problem, exactly. Here's the HTML to duplicate it:

<?php
$string = "This is a “test”.";
?>
<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html;charset=utf8"/>
    </head>
    <body>
        <p><?php echo $string; ?></p>
    </body>
</html>

Make sure you save this file in latin1...

To see what charset your table is set to, run this query:

SELECT CCSA.character_set_name, TABLE_COLLATION FROM information_schema.`TABLES` T,
       information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CCSA
WHERE CCSA.collation_name = T.table_collation
  AND T.table_schema = "database"
  AND T.table_name = "table";

The only proper results for your uses (unless you're using multiple non-English languages) is:

+--------------------+-----------------+
| character_set_name | TABLE_COLLATION |
+--------------------+-----------------+
| utf8               | utf8_general_ci |
+--------------------+-----------------+

Thanks for the upvotes ;-)

其他提示

Make sure that your PHP file has this at the top before any content is printed. I can take latin_swedish_ci into a utf8 encoded website and it encodes correctly.

header("Content-type: text/html;charset=UTF-8");

I also put this after my database connection (not sure if this matters as much):

mysql_query("SET NAMES 'utf8'");
mysql_query("SET CHARACTER SET 'utf8'");

For what it's worth for anyone else coming accross this post, I found that adding these mysqld configuration lines - if you have access to the mysql server and can make changes - solved my problem with the curly-quotes.

http://dev.mysql.com/doc/refman/5.6/en/charset-server.html

# Force UTF8 Charset Encoding
skip-character-set-client-handshake
collation_server=utf8_unicode_ci
character_set_server=utf8

I had double-checked the SQL being called from PHP (which appeared fine), and also manually executed an insert/update statment with curly quotes from my GUI (which worked fine), but from the web server was still getting the multi-control characters inserted into the database.

I checked my mysql server variables and noticed latin1 was the default for the server, and the database (even though the table/columns were UTF8). Once I added the lines above and refreshed the page that issued the update statement, the curly quotes did insert correctly. I can only assume this had something to do with our server's default charset being latin1 and the web server mysql library handshake negotiating as such.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top