Batchify long Linq operations?

Question 1

Important: the image shown includes OrderBy: you should note that this breaks batchify here, because OrderBy is a buffered operator. The batchify method I showed is intended for non-buffered spooling streams.

In the context in which I used it, the origin (before the batchify) was an iterator block that did lots of things involving object creation and pseudo-random number generators on each iteration. Because the code in question was timing sensitive, what I did not want to do was to introduce a reliable pause (for the CPU work of creating each item) between each call to the store. This was in part to emulate the original code, which created all the objects up front, and was in part because I understand how SE.Redis handles socket work.

Let's consider the behaviour without Batchify:

create an item (CPU work) and yield it
send it to the store (network IO)
create an item (CPU work) and yield it
send it to the store (network IO)
create an item (CPU work) and yield it
send it to the store (network IO)
...

In particular, this means that there is a predictable pause between store requests. SE.Redis handles socket IO on a dedicated worker thread, and the above could quite easily result in high packet fragmentation, especially since I was using the "fire and forget" flag. The writer thread needs to flush periodically, which it does when either the buffer hits a critical size, or there is no more work in the outbound message queue.

Now consider what batchify does:

create an item (CPU work) and buffer it
create an item (CPU work) and buffer it
...
create an item (CPU work) and buffer it
yield an item
send it to the store (network IO)
yield an item
send it to the store (network IO)
...
yield an item
send it to the store (network IO)
create an item (CPU work) and buffer it
...

here you can hopefully see that the CPU effort between store requests is significantly reduced. This more correctly mimics the original code where a list of millions was created initially, and then iterated. But additionally, it means that there is a very good chance that the thread creating outbound messages can go at least as fast as the writer thread, which means that the outbound queue is unlikely to become zero for any appreciable time. This allows for much lower packet fragmentation, because now instead of having a packet per request, there's a good chance that multiple messages are in each packet. Fewer packets generally means higher bandwidth due to reduced overheads.

Question 2

I know this solution was posted by a user probably more knowledgeable than me, but frankly, in your example, it does not do anything. The real killer in your last post was the fact that you used a List<> to actually create 10M entries in memory before starting the foreach loop on this materialized collection. Now you are using an IEnumerable<> which does not create 10M simultaneously in memory, but one after another (maybe more if parallel). The Batchify method is nice... but if you skip it, it should work just the same. Best case, it's a micro optimization.

Batchify long Linq operations?

edit :