IEnumerable yield return combined with .AsParallel()
-
02-10-2019 - |
Question
I've written some code to try and describe my concern:
static void Main(string[] args)
{
IEnumerable<decimal> marks = GetClassMarks();
IEnumerable<Person> students = GetStudents();
students.AsParallel().ForAll(p => GenerateClassReport(p, marks));
Console.ReadKey();
}
GetClassMarks uses yield return in it from my weird data source. Assume that GenerateClassReport does basically a marks.Sum()/marks.Count() to get the class average.
From what I understand, students.AsParallel().ForAll is a parallel foreach.
My worry is what is going to happen inside the GetClassMarks method.
- Is it going to be enumerated once or many times?
- What order is the enumeration going to happen in?
- Do I need to do a .ToList() on marks to make sure it is only hit once?
Solution
Is it going to be enumerated once or many times?
Assuming that GenerateClassReport()
enumerates marks
once, then marks
will be enumerated once for each element in students
.
What order is the enumeration going to happen in?
Each thread will enumerate the collection in its default order, but several threads will do so concurrently. The concurrent enumeration order is generally unpredictable. Also, you should note that the number of threads is limited and variable, so most likely not all of the enumerations will occur concurrently.
Do I need to do a .ToList() on marks to make sure it is only hit once?
If GetClassMarks()
is an iterator (i.e. it uses the yield
construct), then its execution will be deferred and it will be called once for each time marks
is enumerated (i.e. once for each element in students
). If you use IEnumerable<decimal> marks = GetClassMarks().ToList()
or if GetClassMarks()
internally returns a concrete list or array, then GetClassMarks()
will be executed immediately and the results will be stored and enumerated in each of the parallel threads without calling GetClassMarks()
again.
OTHER TIPS
If
GetClassMarks
is an iterator -- that is, if it usesyield
internally -- then it is effectively a query that will be re-executed whenever you callmarks.Sum()
,marks.Count()
etc.It's almost impossible to predict the order of execution in a parallel query.
Yes. The following will ensure that
GetClassMarks
is only executed once. Subsequent calls tomarks.Sum()
,marks.Count()
etc will use the concrete list rather than re-executing theGetClassMarks
query.List<decimal> marks = GetClassMarks().ToList();
Note that points 1 and 3 apply whether or not you're using AsParallel
. The GetClassMarks
query will be executed exactly the same number of times in either case (assuming that the rest of the code, except for the parallel aspects, is the same).
Is it going to be enumerated once or many times?
Just once.
What order is the enumeration going to happen in?
The iterator (function using yield
) determines the order.
Do I need to do a .ToList() on marks to make sure it is only hit once?
No.
AsParallel
only iterates through its input once, partitioning the input into blocks which are dispatched to worker threads.