Question

I am trying to parse a document using Dom4J. This document comes from various providers, and sometimes comes with namespaces and sometimes without.

For eg:

<book>
   <author>john</author>
   <publisher>
     <name>John Q</name>
   </publisher>
</book>

or

<book xmlns="http://schemas.xml.com/XMLSchemaInstance">
   <author>john</author>
   <publisher>
     <name>John Q</name>
   </publisher>
</book>

or

<book xmlns:i="http://schemas.xml.com/XMLSchemaInstance">
   <i:author>john</i:author>
   <i:publisher>
     <i:name>John Q</i:name>
   </i:publisher>
</book>

I have a list of XPaths. I parse the document into a Document class, and then search on it using the xpaths.

        Document doc = parseDocument(documentFile);
        List<String> XmlPaths = new List<String>();
        XmlPaths.add("book/author");
        XmlPaths.add("book/publisher/name");

        for (int i = 0; i < XmlPaths.size(); i++)
        {
            String searchPath = XmlPaths.get(i);

            Node currentNode = doc.selectSingleNode(searchPath);
            assert(currentNode != null);
        }

This code does not work on the last document, the one that is using namespace prefixes.

I tried these techniques, but none of them seem to work.

1) changing the last element in the xpath to be namespace neutral:

/book/:author
/book/[local-name()='author']
/[local-name()='book']/[local-name()='author']

All of these throw an exception saying that the XPATH format is not correct.

2) Adding namespace uris to the XPAth, after creating it using DocumentHelper.createXPath();

Any idea what I am doing wrong?

FYI I am using dom4j version 1.5

Was it helpful?

Solution

Your XPath does not contain a tag name. The general syntax in your case would be

/TAGNAMEPARENT[CONDITION_PARENT]/TAGNAMECHILD[CONDITION_CHILD]

The important aspect is that the tag names are mandatory while the conditions are optional. If you do not want to specify a tag name you have use * for "any tag". There may be performance implications for large XML files since you will always have to iterate over a node set instead of using an index lookup. Maybe @MichaelKay can comment on this.

Try this instead:

/*[local-name()='book']/*[local-name()='author']
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top