Question

On page 88 of Stephen Toub's book

http://www.microsoft.com/download/en/details.aspx?id=19222

There is the code

private BlockingCollection<T> _streamingData = new BlockingCollection<T>();
// Parallel.ForEach
Parallel.ForEach(_streamingData.GetConsumingEnumerable(),
item => Process(item));
// PLINQ
var q = from item in _streamingData.GetConsumingEnumerable().AsParallel()
...
select item;

Stephen then mentions

"when passing the result of calling GetConsumingEnumerable as the data source to Parallel.ForEach, the threads used by the loop have the potential to block when the collection becomes empty. And a blocked thread may not be released by Parallel.ForEach back to the ThreadPool for retirement or other uses. As such, with the code as shown above, if there are any periods of time where the collection is empty, the thread count in the process may steadily grow;"

I do not understand why the thread count would grow?

If the collection is empty then wouldn't the blockingcollection not request any further threads?

Hence you do not need to do WithDegreeOfParallelism to limit the number of threads used on the BlockingCollection

Was it helpful?

Solution

The thread pool has a hill climbing algorithm that it uses to estimate the appropriate number of threads. As long as adding threads increases throughput, the thread pool will create more threads. It will assume that some blocking or IO happens and try to saturate the CPU by going over the count of processors in the system.

That is why doing IO and blocking stuff on thread pool threads can be dangerous.

Here is a fully working example of said behavior:

        BlockingCollection<string> _streamingData = new BlockingCollection<string>();

        Task.Factory.StartNew(() =>
            {
                for (int i = 0; i < 100; i++)
                {
                    _streamingData.Add(i.ToString());
                    Thread.Sleep(100);
                }
            });

        new Thread(() =>
            {
                while (true)
                {
                    Thread.Sleep(1000);
                    Console.WriteLine("Thread count: " + Process.GetCurrentProcess().Threads.Count);
                }
            }).Start();

        Parallel.ForEach(_streamingData.GetConsumingEnumerable(), item =>
            {
            });

I do not know why the thread count keeps climbing although it does not increase throughput. According to the model that I explained it would not grow. But I do not know if my model is actually correct.

Maybe the thread-pool has an additional heuristic that makes it spawn threads if it sees no progress at all (measured in tasks completed per second). That would make sense because that would likely prevent a lot of deadlocks in applications. Deadlocks can happen if important tasks cannot run because they are waiting for existing tasks to exit and make threads available. This is a well-known problem with the thread pool.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top