Question

I am trying to parse an NewsML (http://www.iptc.org/std/NewsML-G2/2.7/examples/LISTING2_NewsML-G2_Complete.xml) document with querypath. But I have trouble with the dots in some elements, like <body.head>.

In some firefox querypath plugins I am able to escape the dot with a backslash, but in the php pear library this does not work.

Any ideas?

(I am looking for solution within Querypath, not for workarounds)

Was it helpful?

Solution

In the past, I've used the Tidy PHP extension (http://us3.php.net/manual/en/book.tidy.php) to clean up HTML/XML before passing it into QueryPath.

The XML you referenced above is pretty clean, and also pretty small.

If the only issue is dots in element names, preprocessing with a regular expression would probably work, too. And it would be the fastest solution. I'm guessing you could do a preg_replace('/<body\./g', '<body-', $xml) and have it fixed. (That would replace body.content with body-content and so on.)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top