Question

I'm trying to validate an XML file against a number of different schemas (apologies for the contrived example):

  • a.xsd
  • b.xsd
  • c.xsd

c.xsd in particular imports b.xsd and b.xsd imports a.xsd, using:

<xs:include schemaLocation="b.xsd"/>

I'm trying to do this via Xerces in the following manner:

XMLSchemaFactory xmlSchemaFactory = new XMLSchemaFactory();
Schema schema = xmlSchemaFactory.newSchema(new StreamSource[] { new StreamSource(this.getClass().getResourceAsStream("a.xsd"), "a.xsd"),
                                                         new StreamSource(this.getClass().getResourceAsStream("b.xsd"), "b.xsd"),
                                                         new StreamSource(this.getClass().getResourceAsStream("c.xsd"), "c.xsd")});     
Validator validator = schema.newValidator();
validator.validate(new StreamSource(new StringReader(xmlContent)));

but this is failing to import all three of the schemas correctly resulting in cannot resolve the name 'blah' to a(n) 'group' component.

I've validated this successfully using Python, but having real problems with Java 6.0 and Xerces 2.8.1. Can anybody suggest what's going wrong here, or an easier approach to validate my XML documents?

Was it helpful?

Solution

So just in case anybody else runs into the same issue here, I needed to load a parent schema (and implicit child schemas) from a unit test - as a resource - to validate an XML String. I used the Xerces XMLSchemFactory to do this along with the Java 6 validator.

In order to load the child schema's correctly via an include I had to write a custom resource resolver. Code can be found here:

https://code.google.com/p/xmlsanity/source/browse/src/com/arc90/xmlsanity/validation/ResourceResolver.java

To use the resolver specify it on the schema factory:

xmlSchemaFactory.setResourceResolver(new ResourceResolver());

and it will use it to resolve your resources via the classpath (in my case from src/main/resources). Any comments are welcome on this...

OTHER TIPS

http://www.kdgregory.com/index.php?page=xml.parsing section 'Multiple schemas for a single document'

My solution based on that document:

URL xsdUrlA = this.getClass().getResource("a.xsd");
URL xsdUrlB = this.getClass().getResource("b.xsd");
URL xsdUrlC = this.getClass().getResource("c.xsd");

SchemaFactory schemaFactory = schemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
//---
String W3C_XSD_TOP_ELEMENT =
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\n"
   + "<xs:schema xmlns:xs=\"http://www.w3.org/2001/XMLSchema\" elementFormDefault=\"qualified\">\n"
   + "<xs:include schemaLocation=\"" +xsdUrlA.getPath() +"\"/>\n"
   + "<xs:include schemaLocation=\"" +xsdUrlB.getPath() +"\"/>\n"
   + "<xs:include schemaLocation=\"" +xsdUrlC.getPath() +"\"/>\n"
   +"</xs:schema>";
Schema schema = schemaFactory.newSchema(new StreamSource(new StringReader(W3C_XSD_TOP_ELEMENT), "xsdTop"));

The schema stuff in Xerces is (a) very, very pedantic, and (b) gives utterly useless error messages when it doesn't like what it finds. It's a frustrating combination.

The schema stuff in python may be a lot more forgiving, and was letting small errors in the schema go past unreported.

Now if, as you say, c.xsd includes b.xsd, and b.xsd includes a.xsd, then there's no need to load all three into the schema factory. Not only is it unnecessary, it will likely confuse Xerces and result in errors, so this may be your problem. Just pass c.xsd to the factory, and let it resolve b.xsd and a.xsd itself, which it should do relative to c.xsd.

From the xerces documentation : http://xerces.apache.org/xerces2-j/faq-xs.html

import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;

...

StreamSource[] schemaDocuments = /* created by your application */;
Source instanceDocument = /* created by your application */;

SchemaFactory sf = SchemaFactory.newInstance(
    "http://www.w3.org/XML/XMLSchema/v1.1");
Schema s = sf.newSchema(schemaDocuments);
Validator v = s.newValidator();
v.validate(instanceDocument);

I faced the same problem and after investigating found this solution. It works for me.

Enum to setup the different XSDs:

public enum XsdFile {
    // @formatter:off
    A("a.xsd"),
    B("b.xsd"),
    C("c.xsd");
    // @formatter:on

    private final String value;

    private XsdFile(String value) {
        this.value = value;
    }

    public String getValue() {
        return this.value;
    }
}

Method to validate:

public static void validateXmlAgainstManyXsds() {
    final SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);

    String xmlFile;
    xmlFile = "example.xml";

    // Use of Enum class in order to get the different XSDs
    Source[] sources = new Source[XsdFile.class.getEnumConstants().length];
    for (XsdFile xsdFile : XsdFile.class.getEnumConstants()) {
        sources[xsdFile.ordinal()] = new StreamSource(xsdFile.getValue());
    }

    try {
        final Schema schema = schemaFactory.newSchema(sources);
        final Validator validator = schema.newValidator();
        System.out.println("Validating " + xmlFile + " against XSDs " + Arrays.toString(sources));
        validator.validate(new StreamSource(new File(xmlFile)));
    } catch (Exception exception) {
        System.out.println("ERROR: Unable to validate " + xmlFile + " against XSDs " + Arrays.toString(sources)
                + " - " + exception);
    }
    System.out.println("Validation process completed.");
}

I ended up using this:

import org.apache.xerces.parsers.SAXParser;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import org.xml.sax.helpers.DefaultHandler;
import java.io.IOException;
 .
 .
 .
 try {
        SAXParser parser = new SAXParser();
        parser.setFeature("http://xml.org/sax/features/validation", true);
        parser.setFeature("http://apache.org/xml/features/validation/schema", true);
        parser.setFeature("http://apache.org/xml/features/validation/schema-full-checking", true);
        parser.setProperty("http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation", "http://your_url_schema_location");

        Validator handler = new Validator();
        parser.setErrorHandler(handler);
        parser.parse("file:///" + "/home/user/myfile.xml");

 } catch (SAXException e) {
    e.printStackTrace();
 } catch (IOException ex) {
    e.printStackTrace();
 }


class Validator extends DefaultHandler {
    public boolean validationError = false;
    public SAXParseException saxParseException = null;

    public void error(SAXParseException exception)
            throws SAXException {
        validationError = true;
        saxParseException = exception;
    }

    public void fatalError(SAXParseException exception)
            throws SAXException {
        validationError = true;
        saxParseException = exception;
    }

    public void warning(SAXParseException exception)
            throws SAXException {
    }
}

Remember to change:

1) The parameter "http://your_url_schema_location" for you xsd file location.

2) The string "/home/user/myfile.xml" for the one pointing to your xml file.

I didn't have to set the variable: -Djavax.xml.validation.SchemaFactory:http://www.w3.org/2001/XMLSchema=org.apache.xerces.jaxp.validation.XMLSchemaFactory

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top