Question

So in my current project I use the JAXB RI with the default Java parser from Sun's JRE (which I believe is Xerces) to unmarshal arbitrary XML.

First I use XJC to compile an XSD of the following form:

<?xml version="1.0" encoding="utf-8" ?> 
<xs:schema attributeFormDefault="unqualified" 
elementFormDefault="qualified" 
xmlns:xs="http://www.w3.org/2001/XMLSchema"> 
<xs:element name="foobar">
...
</xs:element> 
</xs:schema>

In the "good case" everything works as designed. That is to say if I'm passed XML that conforms to this schema then JAXB correctly unmarshals it into an object tree.

The problem comes when I'm passed XML with an external DTD references, e.g.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foobar SYSTEM "http://blahblahblah/foobar.dtd">
<foobar></foobar>

Upon unmarshalling something like this, the SAX parser attempts to load the remote entity ("http://somehost/foobar.dtd") despite the fact that this snippet clearly does not conform to the schema I compiled earlier with XJC.

In order to circumvent this behavior, since I know that any conformant XML (according to the XSD I compiled) will never require the loading of a remote entity, I have to define a custom EntityResolver that short circuits the loading of all remote entities. So instead of doing something like:

MyClass foo = (MyClass) myJAXBContext.createUnmarshaller().unmarshal(myReader);

I'm forced to do this:

XMLReader myXMLReader = mySAXParser.getXMLReader();
myXMLReader.setEntityResolver(myCustomEntityResolver);
SAXSource mySAXSource = new SAXSource(myXMLReader, new InputSource(myReader));
MyClass foo = (MyClass) myJAXBContext.createUnmarshaller().unmarshal(mySAXSource);

So my ultimate question is:

When unmarshalling with JAXB, should the loading of remote entities by the underlying SAX parser be automatically short circuited when the XML in question can be recognized as invalid without the loading of those remote entities?

Also, doesn't this seem like a security issue? Given that JAX-WS relies on JAXB under the hood, it seems like I could pass specially crafted XML to any JAX-WS based web service and cause the WS host to load any arbitrary URL.

I'm a relative newbie to this, so there's probably something I'm missing. Please let me know if so!

Was it helpful?

Solution

A well-crafted question, it deserves an answer :)

Some things to note:

  1. The JAXB runtime is not dependent on XML Schema. It uses a SAX parser to generate a stream of SAX events which it uses to bind on to the object model. This object model can be hand-written, or can be generated from a schema using XJC, but the binding and the runtime are very distinct from each other. So you may know that good XML input conforms to the schema at runtime, but JAXB does not.
  2. Forcing the runtime to load a remote DTD reference does not constitute a security hole. If there's a real DTD at the end of it, the worst case is that it won't validate. If it's not a real DTD, then it'll be ignored.
  3. DTD is considered obsolete, and so there's no direct support for it in the high level JAXB API. If you need an EntityResolver, you need to dig into the SAX API, which you have already done.
  4. If your class model was generated from an XML Schema, then you should consider validating against it at runtime, using SchemaFactory and Unmarshaller.setSchema(). This will instruct Xerces to validate the SAX events against the schema before being passed to JAXB. This won't stop the DTD being fetched, but it adds a layer of safety that you know the data is good.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top