Pergunta

In particular, I'm looking at using TPL to start (and wait for) external processes. Does the TPL look at total machine load (both CPU and I/O) before deciding to start another task (hence -- in my case -- another external process)?

For example:

I've got about 100 media files that need to be encoded or transcoded (e.g. from WAV to FLAC or from FLAC to MP3). The encoding is done by launching an external process (e.g. FLAC.EXE or LAME.EXE). Each file takes about 30 seconds. Each process is mostly CPU-bound, but there's some I/O in there. I've got 4 cores, so the worst case (transcoding by piping the decoder into the encoder) still only uses 2 cores. I'd like to do something like:

Parallel.ForEach(sourceFiles,
    sourceFile =>
        TranscodeUsingPipedExternalProcesses(sourceFile));

Will this kick off 100 tasks (and hence 200 external processes competing for the CPU)? Or will it see that the CPU's busy and only do 2-3 at a time?

Foi útil?

Solução

You're going to run into a couple of issues here. The starvation avoidance mechanism of the scheduler will see your tasks as blocked as they wait on processes. It will find it hard to distinguish between a deadlocked thread and one simply waiting for a process to complete. As a result it may schedule new tasks if your tasks run or a long time (see below). The hillclimbing heuristic should take into account the overall load on the system, both from your application and others. It simply tries to maximize work done, so it will add more work until the overall throughput of the system stops increasing and then it will back off. I don't think this will effect your application but the stavation avoidance issue probably will.

You can find more detail as to how this all works in Parallel Programming with Microsoft®.NET, Colin Campbell, Ralph Johnson, Ade Miller, Stephen Toub (an earlier draft is online).

"The .NET thread pool automatically manages the number of worker threads in the pool. It adds and removes threads according to built-in heuristics. The .NET thread pool has two main mechanisms for injecting threads: a starvation-avoidance mechanism that adds worker threads if it sees no progress being made on queued items and a hillclimbing heuristic that tries to maximize throughput while using as few threads as possible.

The goal of starvation avoidance is to prevent deadlock. This kind of deadlock can occur when a worker thread waits for a synchronization event that can only be satisfied by a work item that is still pending in the thread pool’s global or local queues. If there were a fixed number of worker threads, and all of those threads were similarly blocked, the system would be unable to ever make further progress. Adding a new worker thread resolves the problem.

A goal of the hill-climbing heuristic is to improve the utilization of cores when threads are blocked by I/O or other wait conditions that stall the processor. By default, the managed thread pool has one worker thread per core. If one of these worker threads becomes blocked, there’s a chance that a core might be underutilized, depending on the computer’s overall workload. The thread injection logic doesn’t distinguish between a thread that’s blocked and a thread that’s performing a lengthy, processor-intensive operation. Therefore, whenever the thread pool’s global or local queues contain pending work items, active work items that take a long time to run (more than a half second) can trigger the creation of new thread pool worker threads.

The .NET thread pool has an opportunity to inject threads every time a work item completes or at 500 millisecond intervals, whichever is shorter. The thread pool uses this opportunity to try adding threads (or taking them away), guided by feedback from previous changes in the thread count. If adding threads seems to be helping throughput, the thread pool adds more; otherwise, it reduces the number of worker threads. This technique is called the hill-climbing heuristic. Therefore, one reason to keep individual tasks short is to avoid “starvation detection,” but another reason to keep them short is to give the thread pool more opportunities to improve throughput by adjusting the thread count. The shorter the duration of individual tasks, the more often the thread pool can measure throughput and adjust the thread count accordingly.

To make this concrete, consider an extreme example. Suppose that you have a complex financial simulation with 500 processor-intensive operations, each one of which takes ten minutes on average to complete. If you create top-level tasks in the global queue for each of these operations, you will find that after about five minutes the thread pool will grow to 500 worker threads. The reason is that the thread pool sees all of the tasks as blocked and begins to add new threads at the rate of approximately two threads per second.

What’s wrong with 500 worker threads? In principle, nothing, if you have 500 cores for them to use and vast amounts of system memory. In fact, this is the long-term vision of parallel computing. However, if you don’t have that many cores on your computer, you are in a situation where many threads are competing for time slices. This situation is known as processor oversubscription. Allowing many processor-intensive threads to compete for time on a single core adds context switching overhead that can severely reduce overall system throughput. Even if you don’t run out of memory, performance in this situation can be much, much worse than in sequential computation. (Each context switch takes between 6,000 and 8,000 processor cycles.) The cost of context switching is not the only source of overhead. A managed thread in .NET consumes roughly a megabyte of stack space, whether or not that space is used for currently executing functions. It takes about 200,000 CPU cycles to create a new thread, and about 100,000 cycles to retire a thread. These are expensive operations.

As long as your tasks don’t each take minutes, the thread pool’s hill-climbing algorithm will eventually realize it has too many threads and cut back on its own accord. However, if you do have tasks that occupy a worker thread for many seconds or minutes or hours, that will throw off the thread pool’s heuristics, and at that point you should consider an alternative.

The first option is to decompose your application into shorter tasks that complete fast enough for the thread pool to successfully control the number of threads for optimal throughput. A second possibility is to implement your own task scheduler object that does not perform thread injection. If your tasks are of long duration, you don’t need a highly optimized task scheduler because the cost of scheduling will be negligible compared to the execution time of the task. MSDN® developer program has an example of a simple task scheduler implementation that limits the maximum degree of concurrency. For more information, see the section, “Further Reading,” at the end of this chapter.

As a last resort, you can use the SetMaxThreads method to configure the ThreadPool class with an upper limit for the number of worker threads, usually equal to the number of cores (this is the Environment.ProcessorCount property). This upper limit applies for the entire process, including all AppDomains."

Outras dicas

The short answer is: no.

Internally, the TPL uses the standard ThreadPool to schedule its tasks. So you're actually asking whether the ThreadPool takes machine load into account and it doesn't. The only thing that limits the number of tasks simultaneously running is the number of threads in the thread pool, nothing else.

Is it possible to have the external processes report back to your application once they are ready? In that case you do not have to wait for them (keeping threads occupied).

Ran a test using TPL/ThreadPool to schedule a great number of tasks doing looped spins. Using an external app I've loaded one of the cores to 100% using proc affinity. The number of active tasks never decreased.

Even better, I ran multiple instances of the same CPU intensive .NET TPL enabled app. The number of threads for all the apps was the same, and never went below the number of cores, even though my machine was barely usable.

So theory aside, TPL uses the number of cores available, but never checks on their actual load. A very poor implementation in my opinion.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top