What is the performance differences between using parallel.foreach and task inside foreach loop?

StackOverflow https://stackoverflow.com/questions/18163888

Вопрос

I would like to know what is the best way or are there any documents/articles that can help me to identify what is the differences of using Parallel.foreach and Task within a normal for each loop, like the following:

case 1 - Parallel.foreach:

Parallel.foreach
{
  // Do SOmething thread safe: parsing an xml and then save 
  // into a DB Server thry respoitory approach
}

case 2 - Task within foreach:

foreach
{
  Task t1 = Task.factory.startNew(()=>
  {
     //Do the same thing as case 1 that is thread safe
  }
}
Task.waitall()
  • I did do my own tests and the result show case 1 perform way better than case 2. The ratio is about like this: sequential vs case 1 vs case 2 = 5s : 1s : 4s

While there are almost a 1:4 on the case 1 and case 2 ? So is it means we should always use parallel.foreach or parallel.for if we want to run in parallel within the loop?

Это было полезно?

Решение

First, the best documentation on the subject is Part V of CLR via C#.

http://www.amazon.com/CLR-via-C-Developer-Reference/dp/0735667454/ref=sr_1_1?ie=UTF8&qid=1376239791&sr=8-1&keywords=clr+via+c%23

Secondly, I would expect the Parallel.Foreach to perform better because it will not only create Tasks, but group them. In Jeffrey Richter's book, he explains that tasks that are started individually, will be put on the thread pool queue. There is some overhead to locking the actual thread pool queue. To combat this, Tasks themselves have their own queue for Tasks that they create. This task sub-queue held by the Tasks can actually do some work without locking!

I would have to read that chapter again (Chapter 27), so I am not sure that Parallel.Foreach works this way, but this is what I would expect it to do.

Locking, he explains, is expensive because it requires accessing a kernel level construct.

In either case, do not expect them to process sequentially. Using Parallel.Foreach is less likely to process sequentially than the foreach keyword due to the aforementioned internals.

Другие советы

What Parallel.ForEach() does is that it creates a small number of Tasks to process iterations of your loop. Tasks are relatively cheap, but they aren't free, so this tends to improve performance. And the body of your loop executes quickly, the improvement can be really big. This is the most likely explanation for the behavior you're observing.

How many tasks are you running? Just the creation of a new task could require a significant amount of time if you're looping enough. i.e., the following runs in 15 ms for the first block, and over 1 sec for the 2nd block, and the 2nd block doesn't even run the task. Uncomment the Start and the time goes up to nearly 3 sec. The WaitAll only adds a small amount.

static class Program
{
    static void Main()
    {
        const int max = 3000000;
        var range = Enumerable.Range(0, max).ToArray();
        {
            var sw = new Stopwatch();
            sw.Start();
            Parallel.ForEach(range, i => { });
            sw.Stop();
            Console.WriteLine(sw.ElapsedMilliseconds);
        }
        {
            var tasks = new Task[max];
            var sw = new Stopwatch();
            sw.Start();
            foreach (var i in range)
            {
                tasks[i] = new Task(()=> { });
                //tasks[i].Start();
            }
            //Task.WaitAll(tasks);
            sw.Stop();
            Console.WriteLine(sw.ElapsedMilliseconds);
        }
    }
}
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top