Question

I am writing an application that needs to unmarshall a huge XML file using castor. Because of this reason, I need use a streaming XML parser such as Stax to parse the XML file. According to Castor's documentation, castor default parser is Xerces. I visited Xerces home page, and I could not find any information whether Xerces is a streaming parser or not.

Does anyone know whether Xerces is a streaming parser. Thank you.

Was it helpful?

Solution 2

From http://en.wikipedia.org/wiki/Xerces:

Xerces is Apache's collection of software libraries for parsing, validating, serializing and manipulating XML. The library implements a number of standard APIs for XML parsing, including DOM, SAX and SAX2

So it seems to support streaming and non streaming APIs. See http://xerces.apache.org/#xerces2-j for all supported APIs.

OTHER TIPS

There is some advice on the FAQ on how to handle this situation. Quoting the docs.

How do I read data from a stream as it arrives?

There are 3 problems you have to deal with:

  • The Apache parsers read the entire data stream into a buffer before they start parsing; you need to change this behaviour, so that they analyse "on the fly"
  • The Apache parsers terminate when they reach end-of-file; with a data stream, unless the sender drops the socket, you have no end-of-file, so you need to terminate in some other way
  • The Apache parsers close the input stream on termination, and this closes the socket; you normally don't want this, because you'll want to send an ack to the data stream source, and you may want to have further exchanges on the socket anyway.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top