How to get   to behave properly using HTML Purifier?
-
22-08-2019 - |
Question
I am using HTML Purifier in my PHP project and am having trouble getting it to work properly with user input.
I am having users enter in HTML using a WYSIWYG editor (TinyMCE), but whenever a user enters in the HTML entity
(non-breaking space) it gets saved into the database as this weird foreign character (Â
).
However, the thing is, when I edit the saved entry using the WYSIWYG editor it gets displayed properly as
. It also functions properly when displayed, only that in the source code it appears as a real space, but not the non-breaking space character.
Also, in the MySQL database it displays as the weird foreign character.
I read the doc about Unicode and HTML Purifier and changed my database and web page encoding to be UTF-8, but I am still having problems with the non-breaking space character not being mangled. The other HTML entities, such as <
and >
, get saved as <
and >
, but why not
?
Solution
The non-breaking space isn't being saved in your database as one weird foreign character, it's being saved as two characters. The Unicode non-breaking space character is encoded in UTF-8 as 0xC2 0xA0
, which in ISO-8859-1 looks like "Â " (i.e. a weird foreign character followed by a non-breaking space).
You're probably forgetting to do SET NAMES 'utf8'
on your database connection, which causes PHP to send its data to MySQL as ISO-8859-1 (the default).
Have a look at "UTF-8 all the way through…" to see how to properly set up UTF-8 when using PHP and MySQL.
OTHER TIPS
It may also help you to know that  
is an alternate for
which you will likely require if you ever output any human readable XML ;)