getting more informations from sax validator
Question
Can i get for example the node structure or something like this from the validator? Something like a listener or an handler. The exception is not enough. I have to select the node where the error occured. Thats what i build so far.
def factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI)
def schema = factory.newSchema(new StreamSource(new FileReader("src/import.xsd")))
def validator = schema.newValidator()
try {
validator.validate(new StreamSource(new FileReader("src/import.xml")))
println "everything is fine"
} catch(SAXException e) {
println e
}
Thank you.
Solution
Validate when you parse. Here's the code in Java, looks like the translation to Groovy should be pretty straightforward:
InputStream xml = // …
InputStream xsd = // …
SchemaFactory xsFact = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = xsFact.newSchema(new StreamSource(xsd));
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setValidating(false);
dbf.setSchema(schema);
DocumentBuilder db = dbf.newDocumentBuilder();
Document dom = db.parse(new InputSource(xml));
OTHER TIPS
You have to use the SAXParseException to get more errors or the SAX locator if you're implementing ContentHandler, and possibly a lexer. The exception will give you details about the error, line number.
try{
...
}
catch(SAXParseException e){
int lineNumber = e.getLineNumber();
int columnNumber = e.getColumnNumber();
String message = e.getMessage();
// do something
}
catch(SAXException e){
// what should we do?
// if we're implementing ContentHandler
// we can use the org.xml.sax.Locator to get more info
}
Usually the column information from the Locator returns -1. For offset precision, you'll have to either use an extended ContentHandler or a lexer:
- Get the line number of the error
- Estimate the position of the node with the line information, attributes(start tag, end tag) using a lexer or regular expressions or something else.
Depending on how much control you have over the environment, there is a somewhat clunky way to do this. The Xerxes 2 XML parser, which is a drop-in replacement for the default parser, has a property on the Validator to get the current node, so if you keep a reference to the Validator (as a field of an ErrorHandler that you set on the Validator, for example) you can get the node structure. Here's how I did it in Java:
...
Validator validator = schema.newValidator();
validator.setErrorHandler(new MyErrorHandler(validator));
...
public class MyErrorHandler implements ErrorHandler {
private Validator validator;
public AnnotatingErrorHandler(Validator v) {
super();
validator = v;
}
@Override
public void error(SAXParseException e) throws SAXException {
try {
element = (Element)validator.getProperty("http://apache.org/xml/properties/dom/current-element-node");
} catch (SAXNotRecognizedException e) {
log(Level.SEVERE, "Xerxes 2 XML parser is required", saxnre);
} catch (SAXNotSupportedException e) {
; // shouldn't happen in this context
}
... // do stuff
}
...
}
Came across the same issue and resolved it by using the following.
- Used SAXSource when using the validate method.
- Used Implementation of error handler to capture SAXParseException.