Use case for XSL Streaming on a small document other than early exit?

Question 1

Thanks for the data on Saxon. I'm not surprised by the 20% overhead; I wouldn't have been surprised if it was 60%. Much of this has to do with maturity of the implementation; it's hard enough to get streaming working at all, before you start thinking about making it fast. But I would be surprised if it ever becomes significantly faster than conventional processing in the case of documents that are small enough to handle in memory. That's partly because the performance of the kind of transformations you can do using streaming is likely to be dominated by parsing and serialization cost, which is the same in either model.

I'm aware of a number of areas where there's scope for optimization (or at least for detailed measurement to discover whether there's scope for optimization), but the priority is on getting it all working and getting a sufficient body of test cases into place that optimization can be attempted without risking introducing more bugs.

Question 2

Besides large documents, the other possible advantage of streaming -- depending on the exact characteristics of the stylesheet and input document and how you're using the output -- may be reduced latency. That is, it may be possible to start delivering the start of the document to the next stage of processing (or to the user) sooner than in the more traditional processing model. If you're generating HTML, for example, the browser might be able to start getting the page onto the screen a bit faster.

That could be an advantage in some cases even if throughput (time to finish processing the document) is somewhat reduced.

I'm not sure about Saxon's internals, but Xalan has long offered an "incremental parsing" mode which was intended to permit the same kind of tradeoff; it could reduce latency in some cases, but added some overhead for tracking how much of the input had been parsed so far so throughput might be reduced.

Pick the mode that makes sense for your application. Tools for tasks...

(I'd still like to see someone pick up on the streaming-optimization-by-projection concept that IBM patented. It's the most general approach I've yet seen to recognizing streaming optimization opportunities in unrestricted XSLT. Alas, higher-priority work drew off the resources needed to bring it from prototype to production-quality, and I haven't found personal time to attempt a skunkworks version.)