Domanda

I have the following sample XML file:

<a xmlns="http://www.foo.com">
    <b>
    </b>
</a>

Using the XPath expression /foo:a/foo:b (with 'foo' properly configured in the NamespaceContext) I can correctly count the number of b nodes and the code works both when Saxon-HE-9.4.jar is on the CLASSPATH and when it's not.

When, however, I parse the same file with a namespace-unaware DocumentBuilderFactory, the XPath expression "/a/b" correctly counts the number of b nodes only when Saxon-HE-9.4.jar is not on the CLASSPATH.

Code below:

import java.io.*;
import java.util.*;
import javax.xml.xpath.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import javax.xml.namespace.NamespaceContext;

public class FooMain {

    public static void main(String args[]) throws Exception {

        String xmlSample = "<a xmlns=\"http://www.foo.com\"><b></b></a>";
        {
            XPath xpath = namespaceUnawareXpath();
            System.out.printf("[NS-unaware] Number of 'b' nodes is: %d\n", 
                              ((NodeList) xpath.compile("/a/b").evaluate(stringToXML(xmlSample, false),
                              XPathConstants.NODESET)).getLength());
        }
        {
            XPath xpath = namespaceAwareXpath("foo", "http://www.foo.com");
            System.out.printf("[NS-aware  ] Number of 'b' nodes is: %d\n", 
                              ((NodeList) xpath.compile("/foo:a/foo:b").evaluate(stringToXML(xmlSample, true),
                               XPathConstants.NODESET)).getLength());
        }

    }


    public static XPath namespaceUnawareXpath() {
        XPathFactory xPathfactory = XPathFactory.newInstance();
        XPath xpath = xPathfactory.newXPath();
        return xpath;
    }

    public static XPath namespaceAwareXpath(final String prefix, final String nsURI) {
        XPathFactory xPathfactory = XPathFactory.newInstance();
        XPath xpath = xPathfactory.newXPath();
        NamespaceContext ctx = new NamespaceContext() {
                @Override
                public String getNamespaceURI(String aPrefix) {
                    if (aPrefix.equals(prefix))
                        return nsURI;
                    else
                        return null;
                }
                @Override
                public Iterator getPrefixes(String val) {
                    throw new UnsupportedOperationException();
                }
                @Override
                public String getPrefix(String uri) {
                    throw new UnsupportedOperationException();
                }
            };
        xpath.setNamespaceContext(ctx);
        return xpath;
    }    

    private static Document stringToXML(String s, boolean nsAware) throws Exception {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setNamespaceAware(nsAware);
        DocumentBuilder builder = factory.newDocumentBuilder();
        return builder.parse(new ByteArrayInputStream(s.getBytes("UTF-8")));
    }


}

Running the above with:

java -classpath dist/foo.jar FooMain

.. produces:

[NS-unaware] Number of 'b' nodes is: 1
[NS-aware  ] Number of 'b' nodes is: 1

Running with:

java -classpath Saxon-HE-9.4.jar:dist/foo.jar FooMain

... produces:

[NS-unaware] Number of 'b' nodes is: 0
[NS-aware  ] Number of 'b' nodes is: 1
È stato utile?

Soluzione 2

The XPath language is only defined on namespace-well-formed XML, so the behaviour of different processors on a non-namespace-aware DOM tree (even one like <a><b/></a> that, had it been parsed in a namespace-aware manner, would not actually use any namespaces) is at best implementation-specific and at worst completely undefined.

Altri suggerimenti

Correct observation. Saxon doesn't work with a namespace-unaware DOM. There's no reason why it should. If you can find an XSLT/XPath processor that works with a namespace-unaware DOM, then go ahead and use it if you want, but its behaviour isn't defined by any standard.

If it were possible for Saxon to detect that the DOM is namespace-unaware, then it would throw an error rather than giving spurious results. Sadly, one of DOM's many design failings is that if you didn't create the DOM yourself, you can't tell whether it's namespace-aware or not.

Your comment "I need to be lenient on namespaces since I have to handle 3rd-party XML instances that are not always XSD valid." is a complete non-sequitur. It's true that a document can't be XSD-valid unless it is namespace-valid, but the converse is not true; loads of documents are namespace-valid without being XSD-valid.

Finally, as your experience shows, relying on the JAXP mechanism to load whatever XPath processor happens to be lying around on the classpath is very error-prone. You can't even control whether you get an XPath 1.0 or 2.0 processor by this mechanism (and again, you can't find out easily which you have got). If your code is dependent on the quirks of a particular XPath implementation then you need to load that implementation explicitly rather than relying on the JAXP search.

UPDATE (Sep 2015): Saxon 9.6 no longer includes the meta-inf services file that advertises it as a JAXP XPath provider. This means you will never pick up Saxon as your XPath processor simply because it is on the classpath: you have to ask for it explicitly.

Saxon 10 now supports XPaths without namespaces, you can configure it like this:

XPath xPath = new net.sf.saxon.xpath.XPathFactoryImpl().newXPath();
((XPathEvaluator)xPath).getStaticContext().setUnprefixedElementMatchingPolicy(UnprefixedElementMatchingPolicy.ANY_NAMESPACE);
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top