Question

Update There is no ready XML parser in Java community which can do NIO and XML parsing. This is the closest I found, and it's incomplete: http://wiki.fasterxml.com/AaltoHome

I have the following code:

InputStream input = ...;
XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();

XMLStreamReader streamReader = xmlInputFactory.createXMLStreamReader(input, "UTF-8");

Question is, why does the method #createXMLStreamReader() expects to have an entire XML document in the input stream? Why is it called a "stream reader", if it can't seem to process a portion of XML data? For example, if I feed:

<root>
    <child>

to it, it would tell me I'm missing the closing tags. Even before I begin iterating the stream reader itself. I suspect that I just don't know how to use a XMLStreamReader properly. I should be able to supply it with data by pieces, right? I need it because I'm processing a XML stream coming in from network socket, and don't want to load the whole source text into memory.

Thank you for help, Yuri.

Was it helpful?

Solution

If you absolutely need NIO with content "push", there are developers interested in completing API for Aalto. Parser itself is complete Stax implementation as well as alternative "push input" (feeding input instead of using InputStream). So you might instead want to check out mailing lists if you are interested. Not everyone reads StackOverflow questions. :-)

OTHER TIPS

You can get what you want - a partial parse, but you must not close the stream when you reach the end of the current available data. Keep the stream open, and the parser will simply block when it gets to the end of the stream. When you have more data, then add it to the stream, and the parser will continue.

This arrangement requires two threads - one thread running the parser, and another fetching data. To bridge the two threads, you use a pipe - a PipeInputStream and PipeOutputStream pair that push data from the reader thread into the input stream used by the parser. (The parser is reading data from the PipeInputStream.)

The stream must contain the content for an entire XML document, just not all in memory at the same time (this is what streams do). You might be able to keep the stream and the reader open to continue feeding in content; however, it would have to be part of a well-formed XML document.

Suggestion: You might want to read a bit more about how sockets and streams work before going much farther.

Hope this helps.

Which Java version are you using? With JDK 1.6.0_19, I get the behaviour you seem to be expecting. Iterating over your example XML fragment gives me three events:

  • START_ELEMENT (root)
  • CHARACTERS (whitespace between and )
  • START_ELEMENT (child)

The fourth invokation of next() throws an XMLStreamException: ParseError at [row,col]:[2,12] Message: XML document structures must start and end within the same entity.

With the XMLEventReader using stax parser it works for me without any issues.

  final XMLEventReader xmlEventReader= XMLInputFactory
                    .newInstance().createXMLEventReader(new FileInputStream(file));

file is obviously your input.

 while(xmlEventReader.hasNext()){

        XMLEvent xmlEvent = xmlEventReader.nextEvent();
        logger.debug("LOG XML EVENT "+xmlEvent.toString());
        if (xmlEvent.isStartElement()){ 
         //continue implementation

Look at this link to understand more about how streaming parsers work and how does it keep you r memory foot print smaller. For incoming XML, you would need to first serialize the incoming XML and create a well formed XML, then giving it to streaming parser.

http://www.devx.com/xml/Article/34037/1954

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top