Pregunta

I have built a CMS that allows HTML to be stored in a database. It all started off very simple. I displayed the HTML in a textarea using htmlspecialchars to prevent it from breaking the form. Then saved it back using html_specialchars_decode. It all seemed to work fine until someone pasted some HTML into the system instead of typing. At this point it stored fine but lost most of the whitespace which meant all the lovely indentation had to be done from scratch.

To fix it, I tried specifying everything in utf-8 encoding because any attempt to fiddle with it seemed to produce invalid characters.

I specify utf-8 in the PHP header

header('Content-Type: text/html; charset=utf-8');

I specify utf-8 in my HTML page

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

I specify utf-8 in the HTML form

<form accept-charset="utf-8" 

Then I read the posted value (basically) like this:

$Val = $_POST[$SafeFieldName];

My understanding was that PHP did everything in utf-8 so I am a bit surprised at this stage that I get gobbledegook - unless I now do this:

$Val = utf8_decode($Val);

So, at this stage - it works - sort of. I loose all my lovely indentation but not all of my white space. It's as if there are some non utf8 chars being stripped out. Weirdly I'm using Chrome but in Firefox, it seems fine

I think I'm just tying myself in knots now. Any elegant suggestions? I need to get to the bottom of this as opposed to just hack it to get it to work.

¿Fue útil?

Solución 4

Sorted - and the answer is really embarrassing - but you never know, some day someone may need this :)

I noticed that it worked differently (but still fairly rubbish) in Firefox so I had a look at my style sheet and found this:

white-space: nowrap;

Someone (me) must have put that in there to try to get horizontal scrolling working in some browser. Without that, the HTML makes it all the way to the DB and back again.

My only other question was why did I need this since the whole thing should have been arriving in utf8

$Val = utf8_decode($Val);

Magically - now I don't need it.

Otros consejos

The connection to the DB and the DB tables itself should support UTF-8. Make sure that your table's collation is utf8_general_ci and that all string fields within the table also have the utf8_general_ci collation.

The DB connection should be UTF-8 as well:

mysql_set_charset('utf8');

See http://akrabat.com/php/utf8-php-and-mysql/ for more info.

Update: some report that

mysql_query('SET NAMES utf8');

is required sometimes as well!

If making your tables and connection UTF-8 is not possible, you could of course save the HTML as BASE64 encoded data, and decode it back when you retrieve it from the DB again.

Check your DataBase connection encodin, and check DataBase table field encoding where you store HTML. Maybe there encoding is different from UTF-8

If this is an issue in and out of MySQL (as you suggested in the title) then you need to make sure the columns and tables are UTF8-BIN and put mysql_set_charset('utf8'); after opening the connection to MySQL.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top