Algorithm to optimize # threads used in a calculation

https://stackoverflow.com/questions/3761646

04-10-2019
|

Question

I'm performing an operation, lets call it CalculateSomeData. CalculateSomeData operates in successive "generations", numbered 1..x. The number of generations in the entire run is fixed by the input parameters to CalculateSomeData and is known a priori. A single generation takes anywhere from 30 minutes to 2 hours to complete. Some of that variability is due to the input parameters and that cannot be controlled. However, a portion of that variability is due to things like hardware capacities, CPU load from other processes, network bandwidth load, etc. One parameter that can be controlled per-generation is the number of threads that CalculateSomeData uses. Right now that's fixed and likely non-optimal. I'd like to track the time each generation takes and then have some algorithm by which I tweak the number of threads so that each successive generation improves upon the prior generation's calculation time (minimizing time). What approach should I use? How applicable are genetic algorithms? Intuition tells me that the range is going to be fairly tight - maybe 1 to 16 threads on a dual quad-core processor machine.

any pointers, pseudocode, etc. are much appreciated.

Solution

How about an evolutionary algorithm.

Start with a guess. 1 thread per CPU core seems good, but depends on the task at hand.

Measure the average time for each task in the generation. Compare it to the time taken by the previous generation. (Assume effectively infinite time and 0 threads for generation 0).

If the most recent generation tasks averaged a better time than the one before, continue to change the number of threads in the same direction as you did last step (so if the last generation had more threads than the previous thread, then add a thread for the new generation, but if it had fewer, then use one fewer (obviously with a lower limit of 1 thread).

If the most recent generation tasks took longer, on average, than the previous generation, then change the number of threads in the opposite direction (so if increasing the number of threads resulted in worse time, use one fewer thread next time).

As long as the optimal number of threads isn't too close to 1, then you'll probably end up oscillating between 3 values that are all reasonably close to optimal. You may want to explicitly detect this case and lock yourself into the central value, if you have a large number of generations to deal with.

OTHER TIPS

If the calculations are completely CPU bound the number of threads should be equal to the number of cores on the machine. That way you minimize the number of context switches.

If your calculations involve I/O, network, synchronization or something else that blocks execution you must find the limiting resource and measure the utilization. You need to monitor the utilization and slowly add more threads until the utilization gets close to 100%. You should have as few threads as possible to saturate your limiting resource.

You should divide up your generations into lots of small tasks and put them in a queue. Spawn one thread per core and have each thread grab a task to do, run it to completion, and repeat.

You want lots more tasks than cores to make sure that you don't end up with just one task running at the end of the generation and all other threads idle. This is what is likely to happen if you set #tasks = #threads = #cores as Albin suggests (unless you can ensure that all tasks take precisely the same amount of time).

You also probably don't want more threads than cores. Context switching isn't terribly expensive, but the larger cache footprint that comes with having more than #cores tasks simultaneously active could hurt you (unless your tasks use very little memory).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow