Question

Can i get for example the node structure or something like this from the validator? Something like a listener or an handler. The exception is not enough. I have to select the node where the error occured. Thats what i build so far.

def factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI)
def schema = factory.newSchema(new StreamSource(new FileReader("src/import.xsd")))
def validator = schema.newValidator()
try {
    validator.validate(new StreamSource(new FileReader("src/import.xml")))
    println "everything is fine"
} catch(SAXException e) {   
    println e
}

Thank you.

Was it helpful?

Solution

Validate when you parse. Here's the code in Java, looks like the translation to Groovy should be pretty straightforward:

InputStream xml = // …
InputStream xsd = // …

SchemaFactory xsFact = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = xsFact.newSchema(new StreamSource(xsd));

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setValidating(false);
dbf.setSchema(schema);
DocumentBuilder db = dbf.newDocumentBuilder();

Document dom = db.parse(new InputSource(xml));

OTHER TIPS

You have to use the SAXParseException to get more errors or the SAX locator if you're implementing ContentHandler, and possibly a lexer. The exception will give you details about the error, line number.

try{
  ...
}
catch(SAXParseException e){
   int lineNumber = e.getLineNumber();
   int columnNumber = e.getColumnNumber();
   String message = e.getMessage();
   // do something
}
catch(SAXException e){
   // what should we do?
   // if we're implementing ContentHandler 
   // we can use the org.xml.sax.Locator to get more info
}

Usually the column information from the Locator returns -1. For offset precision, you'll have to either use an extended ContentHandler or a lexer:

  • Get the line number of the error
  • Estimate the position of the node with the line information, attributes(start tag, end tag) using a lexer or regular expressions or something else.

Depending on how much control you have over the environment, there is a somewhat clunky way to do this. The Xerxes 2 XML parser, which is a drop-in replacement for the default parser, has a property on the Validator to get the current node, so if you keep a reference to the Validator (as a field of an ErrorHandler that you set on the Validator, for example) you can get the node structure. Here's how I did it in Java:

...
  Validator validator = schema.newValidator();
  validator.setErrorHandler(new MyErrorHandler(validator));
...



public class MyErrorHandler implements ErrorHandler {
  private Validator validator;

  public AnnotatingErrorHandler(Validator v) {
    super();
    validator = v;
  }

  @Override
  public void error(SAXParseException e) throws SAXException {

    try {
      element = (Element)validator.getProperty("http://apache.org/xml/properties/dom/current-element-node");
    } catch (SAXNotRecognizedException e) {
      log(Level.SEVERE, "Xerxes 2 XML parser is required", saxnre);
    } catch (SAXNotSupportedException e) {
   ; // shouldn't happen in this context
    }
    ... // do stuff
  }
  ...
}

Came across the same issue and resolved it by using the following.

  1. Used SAXSource when using the validate method.
  2. Used Implementation of error handler to capture SAXParseException.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top