XML file downloaded from URL <b> / </b> not recognised

https://stackoverflow.com/questions/14213589

14-01-2022
|

Question

I have used this Java NIO to download a xml file from Google Direction.

A screenshot of the xml file opened in IE. This is another screenshot of how it should look right.

For some reason the tags doesn't seem to be recognized, so when I use xPath to evaulate and query the XML file, I get the output like below:

Continue onto <b>Derwent St</b>

 338
 0.3 km

At the roundabout, take the <b>1st</b> exit onto <b>Corporation St</b>

 102
 0.1 km

is there a simpler way to fix this or do I have to use SAX parser?

Solution

The <b> appears as data not a tag. It will either be represented as <b> or appear in a CDATA block.

i.e. the XML contains a fragment of HTML as data. It doesn't include namespaced XHTML.

The output is what you should expect.

What do you next depends on what you want to achieve. Whatever it is, you need to get the data as a string (rather then a textNode) and treat that string as HTML, not text.

If you want to query the HTML, then you need to run it through an HTML parser first.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow