Question

I need to write an application that fetches element name value (time-series data) pair from any xml source, be it file, web server, any other server. the application would consume the XML and take out values of interest, it has to be very very fast (lets say 50000 events/seconds or more) also the XML document size would be huge and frequency of these document could be high as well (for ex. 2500 files/min - more than 500MB of XML data/file).

I just want to see how you experienced people think I should approach this. I am a novice who just got started although I can do any solution you suggest me, no matter how tough/easy.

Thank you very much.

Was it helpful?

Solution

If you use SAX parsing, your bottleneck is the I/O involved, not the XML string processing. And given your 500 MB number, I'd say you'd have to do SAX parsing instead of DOM parsing. So, anything with a SAX type interface should be just fine.

OTHER TIPS

I'm a fan of Xerces, I think you are going to have to try them out to see what has the best performance for your application. Like Warren said you will want to use SAX processing. Realistically if you truly need the performance you should use a specialized XML appliance to do the processing.

I use libxml2 in our projects. It supports both SAX and DOM. As Warren Young said, you should use SAX. You could give Expat a try.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top