Analyzing load balancing schemes to minimize overall execution time

https://cs.stackexchange.com/questions/134

16-10-2019
|

Question

Suppose that a certain parallel application uses a master-slave design to process a large number of workloads. Each workload takes some number of cycles to complete; the number of cycles any given workload will take is given by a known random variable $X$. Assume that there are $n$ such workloads and $m$ equivalent slaves (processing nodes). Naturally, a more general version of this question addresses the case of slaves of differing capabilities, but we ignore this for now.

The master cannot process workloads, but can distribute workloads to slave nodes and monitor progress of slave nodes. Specifically, the master can perform the following actions:

Instantaneously begin processing of any $k$ workloads on any free node.
Instantaneously receive confirmation of the completion by a node of a previously initiated batch of $k$ workloads.
At any point in time, and instantaneously, determine the state of all nodes (free or busy) as well as the number of workloads completed and the number of workloads remaining.

For simplicity's sake, assume $k$ divides $n$.

There are at least two categories of load balancing strategies for minimizing the total execution time of all workloads using all slaves (to clarify, I'm talking about the makespan or wall-clock time, not the aggregate process time, which is independent of the load-balancing strategy being used under the simplifying assumptions being made in this question): static and dynamic. In a static scheme, all placement decisions are made at time $t = 0$. In a dynamic scheme, the master can make placement decisions using information about the progress being made by some slaves, and as such, better utilization can be attained (in practice, there are overheads associated with dynamic scheduling as compared to static scheduling, but we ignore these). Now for some questions:

Is there a better way to statically schedule workloads than to divide batches of $k$ workloads among the $m$ slaves as evenly as possible (we can also assume, for simplicity's sake, that $m$ divides $n/k$, so batches could be statically scheduled completely evenly)? If so, how?
Using the best static scheduling policy, what should the mean and standard deviation be for the total execution time, in terms of the mean $\mu$ and standard deviation $\sigma$ of $X$?

A simple dynamic load balancer might schedule $i$ batches of $k$ workloads to each slave initially, and then, when nodes complete the initial $i$ batches, schedule an additional batch of $k$ workloads to each slave on a first-come, first-served basis. So if two slave nodes are initially scheduled 2 batches of 2 workloads each, and the first slave finishes its two batches, an additional batch is scheduled to the first slave, while the second slave continues working. If the first slave finishes the new batch before the second batch finishes its initial work, the master will continue scheduling to the first slave. Only when the second slave completes executing its work will it be issued a new batch of workloads. Example:

         DYNAMIC           STATIC
         POLICY            POLICY

     slave1  slave2    slave1  slave2
     ------  ------    ------  ------

t<0    --      --        --      --

t<1  batch1  batch3    batch1  batch3
     batch2  batch4    batch2  batch4
                       batch5  batch7
                       batch6  batch8

t=1    --    batch3    batch5  batch3
             batch4    batch6  batch4
                               batch7
                               batch8

t<2  batch5  batch3    batch5  batch3
             batch4    batch6  batch4
                               batch7
                               batch8

t=2    --    batch4    batch6  batch4
                               batch7
                               batch8

t<3  batch6  batch4    batch6  batch4
                               batch7
                               batch8

t=3    --      --        --    batch7
                               batch8

t<4  batch7  batch8      --    batch7
                               batch8

t=4    --      --        --    batch8

t<5      -DONE-          --    batch8

t=5                      --      --

t < 6                      -DONE-

For clarification, batches 1 and 2 take 1/2 second each to be processed, batch 3 takes 2 seconds to be processed, and batches 4-8 take 1 second each to be processed. This information is not known a-priori; in the static scheme, all jobs are distributed at t=0, whereas in the dynamic scheme, distribution can take into account what the actual runtimes of the jobs "turned out" to be. We notice that the static scheme takes one second longer than the dynamic scheme, with slave1 working 3 seconds and slave2 working 5 seconds. In the dynamic scheme, both slaves work for the full 4 seconds.

Now for the question that motivated writing this:

Using the dynamic load balancing policy described above, what should the mean and standard deviation be for the total execution time, in terms of the mean $\mu$ and standard deviation $\sigma$ of $X$?

Interested readers have my assurances that this isn't homework, although it probably isn't much harder than what one might expect to get as homework in certain courses. Given that, if anyone objects to this being asked and demands that I show some work, I will be happy to oblige (although I don't know when I'll have time in the near future). This question is actually based on some work that I never got around to doing a semester or two ago, and empirical results were where we left it. Thanks for help and/or effort, I'll be interested to see what you guys put together.

Solution

Update:

For the new version where you try to minimize the makespan, your static schedule still has the optimal expected value.

Let $M$ be the random variable for the makespan. Let $F_i$ be the time slave $i$ is finished. We then have that $M = \max_i(X_i)$. Let $c_i$ be the number of jobs allocated to slave $i$. Then we have that $X_i = \sum_{i=1}^{c_i} X = c_i X$.

If $F_i(x)$ is the cumulative probability distribution function for $X$, then $P(M < m) $ $ = P(\max_i(X_i) < m) $ $ = \prod_i P(X_i < m) $ $ = \prod_i P(c_i X < m) $ $ = \prod_i P(X < \frac{m}{c_i}) $ $ = \prod_i F(\frac{m}{c_i})$ is the cumulative probability distribution function for $M$. This means that $EM = \int_{-\infty}^{\infty} x (\prod_i F(\frac{x}{c_i}))' dx$ and $stddev(M) = \sqrt{\int_{-\infty}^{\infty} (x - EM)^2 (\prod_i F(\frac{x}{c_i}))' dx}$, as normal.

Minimizing $EM$ amounts to minimizing $\prod_i F(\frac{x}{c_i})$, which means we want to keep all $c_i$s equally low (as $F$ is monotonically increasing and between 0 and 1). This means we should equally distribute all tasks among the slaves, which is exactly what your static schedule achieves.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange