Domanda

I'm investigating use cases for using streaming in XSL. I know of two clear cases:

A. You need to transform a very large document, the entirety of which cannot be held in memory. B. You only need a small part of the document, and often that "small part" is near the top. You can then save time via early exit.

I'm writing to ask if, in practice, there is a third real use case:

C. You have a simple transformation and want to forgo the CPU time required to build the XML tree. To give an example, imagine a store's shipments are stored in an XML structure with the following format:

Top-level = Year

2nd level = Month

3rd level = Day of shipment

4th level = Shipment ID

5th level = Individual items in shipment

Just for sake of example, consider a transformation whose purpose is to pull information at the "month" level.... only needing data stored in attributes of the month elements, and not needing any information about the descendants of these nodes.

Is it possible that such a transformation could benefit from streaming, even though the entire document must be read? I was hoping that some time might be gained because there is no need to build trees, but in my limited testing it appears this is not the case.

I tried such an example in SAXON 9.5.1.3, and streaming was about 20% slower than a non-streaming example. Perhaps the overhead involved with executing streaming will almost always be worse than the time gained by not building trees? (At least in SAXON, where tree building is very fast.)

Or am I making an error in my testing, and there are clear examples where streaming is more efficient, even when the entire document has to be read?

È stato utile?

Soluzione

Thanks for the data on Saxon. I'm not surprised by the 20% overhead; I wouldn't have been surprised if it was 60%. Much of this has to do with maturity of the implementation; it's hard enough to get streaming working at all, before you start thinking about making it fast. But I would be surprised if it ever becomes significantly faster than conventional processing in the case of documents that are small enough to handle in memory. That's partly because the performance of the kind of transformations you can do using streaming is likely to be dominated by parsing and serialization cost, which is the same in either model.

I'm aware of a number of areas where there's scope for optimization (or at least for detailed measurement to discover whether there's scope for optimization), but the priority is on getting it all working and getting a sufficient body of test cases into place that optimization can be attempted without risking introducing more bugs.

Altri suggerimenti

Besides large documents, the other possible advantage of streaming -- depending on the exact characteristics of the stylesheet and input document and how you're using the output -- may be reduced latency. That is, it may be possible to start delivering the start of the document to the next stage of processing (or to the user) sooner than in the more traditional processing model. If you're generating HTML, for example, the browser might be able to start getting the page onto the screen a bit faster.

That could be an advantage in some cases even if throughput (time to finish processing the document) is somewhat reduced.

I'm not sure about Saxon's internals, but Xalan has long offered an "incremental parsing" mode which was intended to permit the same kind of tradeoff; it could reduce latency in some cases, but added some overhead for tracking how much of the input had been parsed so far so throughput might be reduced.

Pick the mode that makes sense for your application. Tools for tasks...

(I'd still like to see someone pick up on the streaming-optimization-by-projection concept that IBM patented. It's the most general approach I've yet seen to recognizing streaming optimization opportunities in unrestricted XSLT. Alas, higher-priority work drew off the resources needed to bring it from prototype to production-quality, and I haven't found personal time to attempt a skunkworks version.)

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top