Question

I asked a question and got answered here about performance issues which I had a with a large collection of data. (created with linq)

ok , let's leave it aside.

But one of the interesting (and geniusly) optimization - suggested by Marc - was to Batchify the linq query.

/*1*/ static IEnumerable<T> Batchify<T>(this IEnumerable<T> source, int count)
/*2*/    {
/*3*/      var list = new List<T>(count);
/*4*/      foreach(var item in source)
/*5*/         {
/*6*/           list.Add(item);
/*7*/           if(list.Count == count)
/*8*/             {
/*9*/               foreach (var x in list) yield return x;
/*10*/              list.Clear();
/*11*/            }
/*12*/        }
/*13*/      foreach (var item in list) yield return item;
/*14*/    }

Here, the purpose of Batchify is to ensure that we aren't helping the server too much by taking appreciable time between each operation - the data is invented in batches of 1000 and each batch is made available very quickly.

Now , I understand what it is doing , but I can't tell the difference since I might be missing the way it actually works. ( sometimes you think you know something...till...)

OK , back to basics :

AFAIK , Linq works like this chain –:

enter image description here

So, we can't start enumerating till the result of select in :

Where-->OrderBy-->Select 

was accomplished.

So basically i'm waiting for select to have all correct data (after where , after orderby), and only then - my code can touch those values. (yielded from the select)

But according to my understanding of Marc's answer , it seems that there is a gap between those yields which allows other resources to do something... (?)

If so , then between each iteration of #4 , after line#9 , there is a time for CPU to do something else ?

Question

  • Can someone shed a light please ? how does this work ?

nb

I already know that( for example) select is nothing but:

public static IEnumerable<TResult> Select<TSource,TResult>
 (this IEnumerable<TSource> source, Func<TSource,TResult> selector)
{
   foreach (TSource element in source)
       yield return selector (elem![enter image description here][3]ent);
}

But if so , my code can't touch it till all values ( after where , orderby) were calculated...

edit :

For those who ask if there's difference: https://i.stack.imgur.com/19Ojw.jpg

2 seconds for 1M items. 9 seconds for 5M items.

(ignore the second line of time , (extra console.write line).) enter image description here

here it is for 5m list : https://i.stack.imgur.com/DflGR.jpg ( the first one is withBatchify , the other is not)

enter image description here

Was it helpful?

Solution

Important: the image shown includes OrderBy: you should note that this breaks batchify here, because OrderBy is a buffered operator. The batchify method I showed is intended for non-buffered spooling streams.

In the context in which I used it, the origin (before the batchify) was an iterator block that did lots of things involving object creation and pseudo-random number generators on each iteration. Because the code in question was timing sensitive, what I did not want to do was to introduce a reliable pause (for the CPU work of creating each item) between each call to the store. This was in part to emulate the original code, which created all the objects up front, and was in part because I understand how SE.Redis handles socket work.

Let's consider the behaviour without Batchify:

  • create an item (CPU work) and yield it
  • send it to the store (network IO)
  • create an item (CPU work) and yield it
  • send it to the store (network IO)
  • create an item (CPU work) and yield it
  • send it to the store (network IO)
  • ...

In particular, this means that there is a predictable pause between store requests. SE.Redis handles socket IO on a dedicated worker thread, and the above could quite easily result in high packet fragmentation, especially since I was using the "fire and forget" flag. The writer thread needs to flush periodically, which it does when either the buffer hits a critical size, or there is no more work in the outbound message queue.

Now consider what batchify does:

  • create an item (CPU work) and buffer it
  • create an item (CPU work) and buffer it
  • ...
  • create an item (CPU work) and buffer it
  • yield an item
  • send it to the store (network IO)
  • yield an item
  • send it to the store (network IO)
  • ...
  • yield an item
  • send it to the store (network IO)
  • create an item (CPU work) and buffer it
  • ...

here you can hopefully see that the CPU effort between store requests is significantly reduced. This more correctly mimics the original code where a list of millions was created initially, and then iterated. But additionally, it means that there is a very good chance that the thread creating outbound messages can go at least as fast as the writer thread, which means that the outbound queue is unlikely to become zero for any appreciable time. This allows for much lower packet fragmentation, because now instead of having a packet per request, there's a good chance that multiple messages are in each packet. Fewer packets generally means higher bandwidth due to reduced overheads.

OTHER TIPS

I know this solution was posted by a user probably more knowledgeable than me, but frankly, in your example, it does not do anything. The real killer in your last post was the fact that you used a List<> to actually create 10M entries in memory before starting the foreach loop on this materialized collection. Now you are using an IEnumerable<> which does not create 10M simultaneously in memory, but one after another (maybe more if parallel). The Batchify method is nice... but if you skip it, it should work just the same. Best case, it's a micro optimization.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top