Pergunta

i have a xmlfile:

$xml = <<<EOD
<?xml version="1.0" encoding="utf-8"?>
<metaData xmlns="http://www.test.com/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="test">
<qkc6b1hh0k9>testdata&amp;more</qkc6b1hh0k9>
</metaData>
EOD;

now i loaded it into a simplexmlobject and later on i wanted to get the inner of the "qkc6b1hh0k9"-node

$xmlRootElem = simplexml_load_string( $xml );
$xmlRootElem->registerXPathNamespace( 'xmlns', "http://www.test.com/" );

// ...

$xPathElems = $xmlRootElem->xpath( './'."xmlns:qkc6b1hh0k9" );
$var = (string)($xPathElems[0]);
var_dump($var);

I expected to get the string

testdata&amp;more

... but i got

testdata&more
  • Why is the __toString() method of simplexmlobject converting my escaped specialchars to normal chars? Can I deactivate this behaviour?
  • I came up with a temp-solution, which I consider as dirty, what do you say?

    (strip_tags($xPathElems[0]->asXML()))

  • May the DOMDocument be an alternative?

Thanks for any help on my questions!

edit

problem solved, problem was not in the __toString method of simplexml, it was later on when using the string with addChild

the behaviour as described above was totaly fine and has to be expected as you can see in the answers...

problems only came up, when the value was added to another xml-document via "addChild". Since addChild doesn't escape the ampersand (http://www.php.net/manual/de/simplexmlelement.addchild.php#103587) one has to do it manually.

Foi útil?

Solução 2

If you create an XML tag, by any sane method, and set it to contain the string "testdata&more", this will be escaped as testdata&amp;more. It is therefore only logical that extracting that string content back out reverses the escaping procedure to give you the text you put in.

The question is, why do you want the XML-escaped representation? If you want the content of the element as intended by the author, then __toString() is doing the right thing; there is more than one way of representing that string in XML, but it is the data being represented that you should normally care about.

If for some reason you really need details of how the XML is constructed in that particular instance, you could use a more complex parsing framework such as DOM, which will separate testdata&amp;more into a text node (containing "testdata"), an entity node (with name "amp"), and another text node (containing "more").

If, on the other hand, all you want is to put it back into another XML (or HTML) document, then let SimpleXML do the unescaping properly, and re-escape it at the appropriate time.

Outras dicas

Why is the __toString() method of simplexmlobject converting my escaped specialchars to normal chars? Can I deactivate this behaviour?

Because those "speical" chars are actually XML encoding of characters. Using the string value gives you these characters verbatim again. That is what an XML parser has been made for.

I came up with a temp-solution, which I consider as dirty, what do you say?

Well, shaky. Instead let me suggest you the inverse: XML encode the the string:

$var = htmlspecialchars($xPathElems[0]);
var_dump($var);

May the DOMDocument be an alternative?

No, as SimpleXML it is an XML Parser and therefore you get the text decoded as well. This is not fully true (you can do that with DomDocument by going through all childnodes and picking entity nodes next to character data, but it's much more work as just outlined with htmlspecialchars() above).

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top