I'm writing an XML that will be a spool for a PostScript form.

Whenever people inserts the character EN DASH (probably copied form MS Word), I get an offending command: xmlerror. Stack: unicode not supported yet ....

http://www.fileformat.info/info/unicode/char/2013/index.htm

The relevant part of the codes are:

$xml = new SimpleXMLElement('<xml/>');
foreach( $_POST as $key => $value ) {
    $xml->$key = $value:
}
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->loadXML($xml->asXML());
$nombreArchivoTemporal = '/tmp/'.time().rand();
$archivo = fopen ( $nombreArchivoTemporal, "wb" );
fwrite ( $archivo, iconv('UTF-8', 'CP1252//TRANSLIT//IGNORE', "@PBSSFORM DNDA\n" . $dom->saveXML()) );
fclose ( $archivo );

The fact is that ÄËÏÖÜáéíóú, etc characters are supported, but that EN DASH and probably others aren't. I'm trying to get rid of them with the iconv function but doesn't seem to work, as the character is already html-entity encoded when I insert in $xml:

<?xml version="1.0"?>
<xml>
<date/>
  <tituloObra>&#xE1;&#xE9;&#xED;&#xF3;&#xFA;&#xC1;&#xC9;&#xCD;&#xD3;&#xDA;&#xE4;&#xEB;&#xEF;&#xF6;&#xFC;&#xC4;&#xCB;&#xCF;&#xD6;&#xDC; &#x2013; &lt;= gui&#xF3;n</tituloObra>

&#x2013; is the problematic character.

有帮助吗?

解决方案 2

Well the problem was something related to SimpleXML. I've tried everything converting the SimpleXML to CP1252, but when I was loading it in DOMDocument->loadXML, I always got Illegal character... errors

I've replaced it for directly using DOMDocument class, specifying CP1252 in the constructor and inserting new records as UTF8.

When I call DOMDOcument->Save(), it automatically encodes the file as CP1252, avoiding the mentioned error in PostScript.

其他提示

EN DASH U+2013 exists in CP1252. The name CP1252 is common but unofficial name for windows-1252, which is defined, in the IANA registry, so that byte 0x95 represents U+2013.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top