Pregunta

I'm writing an XML that will be a spool for a PostScript form.

Whenever people inserts the character EN DASH (probably copied form MS Word), I get an offending command: xmlerror. Stack: unicode not supported yet ....

http://www.fileformat.info/info/unicode/char/2013/index.htm

The relevant part of the codes are:

$xml = new SimpleXMLElement('<xml/>');
foreach( $_POST as $key => $value ) {
    $xml->$key = $value:
}
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->loadXML($xml->asXML());
$nombreArchivoTemporal = '/tmp/'.time().rand();
$archivo = fopen ( $nombreArchivoTemporal, "wb" );
fwrite ( $archivo, iconv('UTF-8', 'CP1252//TRANSLIT//IGNORE', "@PBSSFORM DNDA\n" . $dom->saveXML()) );
fclose ( $archivo );

The fact is that ÄËÏÖÜáéíóú, etc characters are supported, but that EN DASH and probably others aren't. I'm trying to get rid of them with the iconv function but doesn't seem to work, as the character is already html-entity encoded when I insert in $xml:

<?xml version="1.0"?>
<xml>
<date/>
  <tituloObra>&#xE1;&#xE9;&#xED;&#xF3;&#xFA;&#xC1;&#xC9;&#xCD;&#xD3;&#xDA;&#xE4;&#xEB;&#xEF;&#xF6;&#xFC;&#xC4;&#xCB;&#xCF;&#xD6;&#xDC; &#x2013; &lt;= gui&#xF3;n</tituloObra>

&#x2013; is the problematic character.

¿Fue útil?

Solución 2

Well the problem was something related to SimpleXML. I've tried everything converting the SimpleXML to CP1252, but when I was loading it in DOMDocument->loadXML, I always got Illegal character... errors

I've replaced it for directly using DOMDocument class, specifying CP1252 in the constructor and inserting new records as UTF8.

When I call DOMDOcument->Save(), it automatically encodes the file as CP1252, avoiding the mentioned error in PostScript.

Otros consejos

EN DASH U+2013 exists in CP1252. The name CP1252 is common but unofficial name for windows-1252, which is defined, in the IANA registry, so that byte 0x95 represents U+2013.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top