Question

this is a highly theoretical question about my parallelization approach.

First of all, I want to inform everybody that I do not claim that I am the 'inventor' of that approach but I couldn't find any information about something like that.

I will explain my idea through example.

enter image description here

I assume that every task that is a dependency of dress-up pizza task is CPU bounded, so I make sense to execute it on another thread

Usually approach

  1. Compute dependencies of dress-up pizza on other threads
  2. Join every thread
  3. Execute Dress up pizza task

I got the idea that if we don't care about which thread is executing the dress-up pizza task we can avoid joining threads simply by executing that task on a thread that finishes last.

We could store the result of every task on struct with an additional field for the atomic counter.

Pseudo-code example

struct pizza_dependencies {
    cheese,
    olives,
    dough,
    counter
}

fn grate_the_cheese(dependencies){
    dependencies.cheese = cheese

    if dependencies.counter == 2 {
        dress_up_pizza(dependencies)
    } else {
        couter += 1
    }
    
}

The rest of the tasks would be implemented the same way. Counter by default is 0.

  • What would be the advantages/disadvantages of such an approach?
  • Is there any better/faster way of doing this?
  • Is some research/project using a similar approach?

I really want to know what do you think about my idea, have a great day :)

Was it helpful?

Solution

Yes, that's not entirely unreasonable. In practice, you would likely replace the (atomic) counter by a semaphore but all of that is more or less equivalent. A synchronization method that ensures no tasks remain in a critical section is also called a barrier and actually works quite similar to your counter, except that it doesn't concern itself with what comes next.

The issue with your approach is that the tasks need to know what the next task is. Hardcoding the tasks like you did here prevents composability of tasks (you can't recombine them, add or remove tasks). And when done manually this is error prone.

There is also the question what happens if any of the tasks fails without incrementing the shared state counter. As designed, your system would hang forever. If you have a supervisor process, it could e.g. check results for error codes or set a timeout. This is more robust in practice.

To some degree these aspects can be fixed, e.g. running each task under its own error handler, or using explicit continuations to ensure composability.

But why use a complicated solution when a simple solution is just fine? Tasks are usually reasonably cheap so the difference between three tasks in your solution and three tasks with an additional supervisor or main thread shouldn't be that dramatic.

OTHER TIPS

Is some research/project using a similar approach?

I believe Intel's Thread Building Blocks library has a parallel graph flow mechanism that implements your diagram more-or-less directly. It's obviously more general (and probably not lockless), but as a benefit you don't end up with concurrency details embedded in your pizza.

Is there any better/faster way of doing this?

You can both generalize it and hide the implementation away from your cooking ingredients by implementing it as a finite-state machine.

Currently you have an implied state which is the number of ingredients already completely processed, but you can make that explicit. You don't need to enumerate every distinct combination, as we don't care to differentiate between (cheese+olives) and (olives+dough) - they're both part of the same intermediate/incomplete state.

Usual approach ... join every thread

I have literally never seen this in production code. Perhaps it's more common in teaching examples or in toy code?

Licensed under: CC-BY-SA with attribution
scroll top