Producing Consumers Thread Pool

https://softwareengineering.stackexchange.com/questions/348278

11-01-2021
|

Pergunta

I’m coding a Hash-Life implementation in C++14 and working on a multi-threaded implementation. It has obvious potential for multi-threading. The task can be broken down into essentially 13 sub-tasks. 9 of the tasks are largely independent and the other 4 depend on 4 of the 9 but not each other. One one makes 9 (parallel) which come back together to make 4 (parallel).

The process is actually recursive so those 13 will breakdown into 13 of their own down to a ‘trivial’ tier which is just calculated by brute force. At each tier of recursion the tasks get smaller and it seems likely there’s a point where multi-tasking loses its return and the algorithm should go sequential. But that’s a detail. Just imagine multiple tiers of parallelism breaking down.

This whole thing is just begging to be pushed through a worker thread pool.

What I’m inviting is ideas about smart ways of handling this.

A basic thread pool won’t do because there’s a risk (in practice inevitability) of an interesting deadlock. If you chop a task into 13 tasks and then wait for them to complete (or twelve and do one in series or whatever) you will invite deadlock. Essentially you will end up with all the threads waiting on un-started tasks in the queue which will never be started because all the threads are waiting on un-started tasks in the queue that will …. and so on. You roughly need more threads than tasks and while it’s possible to create threads willy-nilly it isn’t an efficient model and compared to below results in excessive thread swapping.

This seems to me to be potentially generic problem. A basic thread-pool model at best ends with n (task) producers and m consumers (workers) but this is a producing-consumer situation that seems like a natural way to parallelise recursion and isn’t covered by that n to m case.

My main idea is to avoid deadlock when a worker realises it needs a task completed to proceed and that task is un-started by taking the task back and complete it in series (deadlock avoided).

There’s an attractive feature here that means rather than sleeping and allowing another thread to be swapped in threads will tend to just remain active and pull work in. There is even scope to estimate the biggest unstarted task and start that first thereby minimising the critical path and minimizing time by maximizing utilization.

The downside is that there’s an overhead of putting a task in a queue if it then gets pulled out and done in the thread that submitted it. There are ways of mitigating that.

My problem is all the searching I can do won’t come back with any analysis of this kind of problem.

People endlessly want to rake over the separate Producer vs Consumer problem and I can’t find how to cover this recursive model which seems so natural.

Considering a 'consumers that are themselves producers' model I'm looking for:

Resources (patterns, papers, blogs, code, etc.) covering this problem.
Proposals, insights or thinking.
Comments or ideas.

That is particularly in consideration of avoiding the deadlock that 'naive' use of a 'basic' thread-pool invites.

C++14 isn’t important here except I have a decent threading library on hand.

I have a toy solution solving the problem of summing numbers 1 - n by recursively dividing the range and adding the parts in C++ and am prepared to share but it's just a bit too long to put in this post.

Solução

You are looking at units of work run in parallel. Many programming environments have something like a Task in .net, which is a unit of work that is not tied to a specific thread. The abstraction you need to perform the work might be called a task. These are typically queued and run on worker threads by a subsystem (a library). There can be many more tasks than worker threads.

The dependencies are the issue. Avoid parallelizing the higher level loops, or recursion, and only parallelize the leaf node work, where presumably you need enough CPU cycles to be worth running in parallel. The algorithm is then to do all your recursion, loops, etc, in one thread, and create a queue of tasks that have no other dependencies, and can be performed in parallel.

If you are having trouble distinguishing between the work needed by the composite nodes and the leaf nodes, the design needs to be revisited. A unit of work (task) that can be run independently on a worker thread should use mutable state that is owned only by that task -- if you need to access shared mutable state, you may gain nothing by parallelization, since you must serialize access to the shared state. In other words, you should not access the producer consumer queues within your units of work.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a softwareengineering.stackexchange