How do I do Evidence Based Scheduling (EBS) for multiple tasks and unknown developers? [closed]

Question 1

1.

You may not add up the values at corresponding levels in the probability distributions. That would baselessly assume perfect correlation between the task completion times. Here is something that you may do instead.

In the worst case, the time to complete two tasks is the sum of the times to complete each task. So far, so good. Worst-case estimation in software development is probably just fine. I don't think that totaling up the times will generally cause a problem.

Now we need to consider whether the two tasks' times are governed by probability distributions that are independent of one other. That is, when we know anything about how soon one task is completed, does that tell us something about how soon the other task is completed? If we don't know, can we make a suitably safe assumption?

Of course it depends on the cost of incorrect estimation, but it may be safe enough to assume that the distributions are indeed independent. That way, at least the completion of one task generally doesn't give us false hope about the other. So the answer is, if one task is analyzed into M outcomes each with its own probability, and the other task is analyzed into N outcomes each with its own probability, we can form the M*N outcomes and assign to the (i,j) outcome the product of probability (density) of the i-th outcome of the first task with the probability (density) of the j-th outcome of the second task.

I'm going to modify your example because, sorry, I don't understand it. Let's say that the first task has this distribution instead, where X is a uniformly distributed continuous random variable between 0% and 100%:

3 hours, if       X <= 20% (with probability density 20%);
4 hours, if 20% < X <= 60% (with probability density 40%);
5 hours, if 60% < X <= 80% (with probability density 20%);
9 hours, if 80% < X        (with probability density 20%).

The second task has this distribution, where Y is a uniformly distributed continuous random variable between 0% and 100%, independent of X:

 4 hours, if       Y <= 20% (with probability density 20%);
 6 hours, if 20% < Y <= 60% (with probability density 40%);
 7 hours, if 60% < Y <= 80% (with probability density 20%);
13 hours, if 80% < Y        (with probability density 20%).

Now we calculate as follows:

               4@20%   6@ 40%   7@20%   13@20%
              ------ -------- ------- --------
     3@20% | 3+4@ 4% 3+6@  8% 3+7@ 4% 3+13@ 4%
     4@40% | 4+4@ 8% 4+6@ 16% 4+7@ 8% 4+13@ 8%
     5@20% | 5+4@ 4% 5+6@  8% 5+7@ 4% 5+13@ 4%
     9@20% | 9+4@ 4% 9+6@  8% 9+7@ 4% 9+13@ 4%

So here's the probability distribution and density for the sum of the two tasks' times, where Z is a uniformly distributed continuous random variable from 0% to 100%:

      7 hours, if       Z <=  4% (with probability density  4%);
      8 hours, if  4% < Z <= 12% (with probability density  8%);
      9 hours, if 12% < Z <= 24% (with probability density 12%);
     10 hours, if 24% < Z <= 44% (with probability density 20%);
     11 hours, if 44% < Z <= 60% (with probability density 16%);
     12 hours, if 60% < Z <= 64% (with probability density  4%);
     13 hours, if 64% < Z <= 68% (with probability density  4%);
     15 hours, if 68% < Z <= 76% (with probability density  8%);
     16 hours, if 76% < Z <= 84% (with probability density  8%);
     17 hours, if 84% < Z <= 92% (with probability density  8%);
     18 hours, if 92% < Z <= 96% (with probability density  4%);
     22 hours, if 96% < Z        (with probability density  4%).

All of this may be tedious, but it's logical and not hard to automate.

2.

You are correct, there is a fanning out of scenarios. Roughly, it starts with the initial certainty that before the world existed, nobody had yet done anything! After that, well, after you have automation for question 1, you could employ various strategies in your analysis. Maybe your imagination is as good as mine for this purpose. Anyway, here's what I can suggest.

You could explore what-if scenarios interactively.

You could attempt to compute and total up everything that could possibly happen. As we have seen, this kind of analysis is possible for small cases. As we can imagine, it will become intractible in large cases, such as presumably building a flight navigation system.

You could analyze the most likely scenario and perhaps a limited degree of variation around that.

Very likely, you will be interested in controlling your risks. So you could consider analyzing one or more of the following, according to your needs and convenience, all of them being a bit different from the rest: the chance of an unacceptable outcome, or the chance that unacceptable degree of uncertainty exists, or an estimate of how much uncertainty exists, or an estimate of the expected outcome (that is, the average outcome if one were to face the same situation endlessly repeated).

Question 2

Without Googling the problem, I would guess that the "unknown developer capabilities" probably pushes the problem into the NP-hard optimization problems bin. A couple of algorithms to look at are Simulated Annealing and Genetic Algorithms (Simulated Annealing was used in Wintek's electronic CAD autoplacement program (I was on the development team)).

How do I do Evidence Based Scheduling (EBS) for multiple tasks and unknown developers? [closed]

Question 1

Question 2