Important: the image shown includes OrderBy
: you should note that this breaks batchify here, because OrderBy
is a buffered operator. The batchify method I showed is intended for non-buffered spooling streams.
In the context in which I used it, the origin (before the batchify) was an iterator block that did lots of things involving object creation and pseudo-random number generators on each iteration. Because the code in question was timing sensitive, what I did not want to do was to introduce a reliable pause (for the CPU work of creating each item) between each call to the store. This was in part to emulate the original code, which created all the objects up front, and was in part because I understand how SE.Redis handles socket work.
Let's consider the behaviour without Batchify
:
- create an item (CPU work) and yield it
- send it to the store (network IO)
- create an item (CPU work) and yield it
- send it to the store (network IO)
- create an item (CPU work) and yield it
- send it to the store (network IO)
- ...
In particular, this means that there is a predictable pause between store requests. SE.Redis handles socket IO on a dedicated worker thread, and the above could quite easily result in high packet fragmentation, especially since I was using the "fire and forget" flag. The writer thread needs to flush periodically, which it does when either the buffer hits a critical size, or there is no more work in the outbound message queue.
Now consider what batchify does:
- create an item (CPU work) and buffer it
- create an item (CPU work) and buffer it
- ...
- create an item (CPU work) and buffer it
- yield an item
- send it to the store (network IO)
- yield an item
- send it to the store (network IO)
- ...
- yield an item
- send it to the store (network IO)
- create an item (CPU work) and buffer it
- ...
here you can hopefully see that the CPU effort between store requests is significantly reduced. This more correctly mimics the original code where a list of millions was created initially, and then iterated. But additionally, it means that there is a very good chance that the thread creating outbound messages can go at least as fast as the writer thread, which means that the outbound queue is unlikely to become zero for any appreciable time. This allows for much lower packet fragmentation, because now instead of having a packet per request, there's a good chance that multiple messages are in each packet. Fewer packets generally means higher bandwidth due to reduced overheads.