Question

I have a List<Order> and I'm trying to filter this using LINQ:

var grouped = from o in orders
  group o by o.OrderNumber into g
  select new { Id = g.Key, Orders = g };

var GroupedList = grouped.ToList();

int max = GroupedList.Count();
int count = 0;
var filtered =
    from g in GroupedList
    where IncrementProgress(max, ref count)
    select g.Id;

var filteredOrders = orders.Where(o => filtered.Contains(o.OrderNumber));

Inside the IncrementProgress I print count and max to the debug-output. max is in my test 3500 and I get Outputs from count of 150000 and counting.

Does someone have any idea why?

PS: In my production code there is filter logic instead of IncrementProgress.

Update:

Here the IncrementProgress-Method:

private bool IncrementProgress(int max, ref int count)
{
    Debug.WriteLine("Filtering {0} of {1}", ++count, max);
    return true;
}
Was it helpful?

Solution

That's because LINQ is lazy and filtered is not a collection - it's an in-memory query, which just stores information how to evaluate result, not the result itself. Because of that, every time you use filtered it's being evaluated again, iterating over GroupedList and checking where condition again.

It means, where condition will be evaluated orders.Count() * GroupedList.Count() times.

Add ToList() call to filtered to evaluate it eagerly.

var filtered =
    (from g in GroupedList
     where IncrementProgress(max, ref count)
     select g.Id).ToList();

But, because you're only using Contains on filtered later, you should use HashSet<int> to store results. It will make Contains call O(1) instead of O(n), which should increase performance a lot.

var filtered =
    new HashSet<int>(from g in GroupedList
                     where IncrementProgress(max, ref count)
                     select g.Id);

OTHER TIPS

Your LINQ query is executed every time you enumerate the filtered collection, in your case each time you call the Contains method.

Try to declare your filtered variable as (<LINQ Query>).ToArray(). This will enumerate the query just once.

Sorry for poor formatting (mobile phone). Hope it helps.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top