Pergunta

I have a MySQL query that returns data for formatting to an XML file. One of the columns is a free text field that can contain strange characters that "breaks" the XML with an encoding error. I believe these characters are a strange " quotes that made it into a record from pasted Microsoft Word when the user originally input the record. I do not have control over that process.

Strange Character example:

“TURN KEY – Totally Furnished†

I am using htmlspecialchars to "clean" this data and it basically removes the field entirely from XML record and makes it blank for that record. This fixes the encoding issue but that record is now missing data for that field. I still want that data, I just want to omit or even change weird characters to something like a dash.

$description  = htmlspecialchars($row['PropertyInformation'], ENT_QUOTES, 'UTF-8');

The XML output ends up like this in the records where the weird characters are occurring:

<DESCRIPTIF>
<![CDATA[ ]]>
</DESCRIPTIF>
Foi útil?

Solução

The htmlspecialchars function returns an empty string if the input string contains an invalid code unit sequence within the given encoding, unless either the ENT_IGNORE or ENT_SUBSTITUTE flags are set.

The ENT_IGNORE flag silently discards invalid code unit sequences instead of returning an empty string. Using this flag is discouraged as it may have security implications.

The ENT_SUBSTITUTE falg replaces invalid code unit sequences with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD; (otherwise) instead of returning an empty string.

You could try to set one of these flags.

htmlspecialchars($string, ENT_QUOTES | ENT_SUBSTITUTE);

Outras dicas

Looks like you forgot to capitalize utf-8

$description = htmlspecialchars($row['PropertyInformation'], ENT_QUOTES, 'UTF-8');

/**
 * Clean a string from non-printable chars
 * 
 * @param string $string
 * @return string
 */
function str_clean($string)
{
    return preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string);
}


$string = '“TURN KEY – Totally Furnishedâ€';
echo htmlspecialchars(str_clean($string), ENT_QUOTES, 'UTF-8');
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top