Question

I am trying to get my head around the Evidence Based Scheduling (EBS) approach used in FogBugz and I have read Evidence Based Scheduling several times.

What I do understand is the general idea, why Monte-Carlo is used, and so on ...

And I can also calculate the extrapolation of an estimation by using the factor's of the past stories. So far, so good.

Question 1

The question I have is: How do I calculate the probability distribution for more than one story? I.e., I want to know when five stories will be finished.

May I just add up the 10% values, the 20% values, ..., and finally the 100% values?

To give an example:

  • I know that story 1 is estimated as 4 hours, and its probability distribution tells me that 0% is 3 hours, 25% is 4 hours, 50% is 4 hours, 75% is 5 hours, and 100% is 9 hours.
  • I know that story 2 is estimated as 6 hours, and its probability distribution tells me that 0% is 4 hours, 25% is 6 hours, 50% is 6 hours, 75% is 7 hours, and 100% is 13 hours.

If I now want to know the probability distribution of story 1 and 2, may I just add them, so I get:

  • 0%: 7 hours
  • 25%: 10 hours
  • 50%: 10 hours
  • 75%: 12 hours
  • 100%: 22 hours

Is that all I need to do? Or is it something more complicated?

Question 2

My other question is how to calculate the end time for multiple tasks when there is more than user involved, but I do not know in advance which user will work on what story. As long as I know that assignment, it's quite easy: Calculcate the sum of stories for each user, and then take the latest one as an overall-time (if one finishes after 3 weeks, the other after 5 weeks, the total project will take 5 weeks).

But what if I don't know in advance, and not every user is able to work on every story? E.g., I have put competencies onto stories, such as front-end, back-end, ... and I have assigned competencies to my users, so there may be developers for front-end, for back-end, ... and so on.

Of course there may be stories which require multiple competencies, which in return requires work from multiple users. But they will be working on different things and require different times for finishing their tasks. And this again depends on the probability distribution: If one has a run, he might finish earlier than if he didn't have. This may influence on what he will work next, whom he may assist, and so on ...

Any idea of how I could calculate this?

Was it helpful?

Solution

1.

You may not add up the values at corresponding levels in the probability distributions. That would baselessly assume perfect correlation between the task completion times. Here is something that you may do instead.

In the worst case, the time to complete two tasks is the sum of the times to complete each task. So far, so good. Worst-case estimation in software development is probably just fine. I don't think that totaling up the times will generally cause a problem.

Now we need to consider whether the two tasks' times are governed by probability distributions that are independent of one other. That is, when we know anything about how soon one task is completed, does that tell us something about how soon the other task is completed? If we don't know, can we make a suitably safe assumption?

Of course it depends on the cost of incorrect estimation, but it may be safe enough to assume that the distributions are indeed independent. That way, at least the completion of one task generally doesn't give us false hope about the other. So the answer is, if one task is analyzed into M outcomes each with its own probability, and the other task is analyzed into N outcomes each with its own probability, we can form the M*N outcomes and assign to the (i,j) outcome the product of probability (density) of the i-th outcome of the first task with the probability (density) of the j-th outcome of the second task.

I'm going to modify your example because, sorry, I don't understand it. Let's say that the first task has this distribution instead, where X is a uniformly distributed continuous random variable between 0% and 100%:

3 hours, if       X <= 20% (with probability density 20%);
4 hours, if 20% < X <= 60% (with probability density 40%);
5 hours, if 60% < X <= 80% (with probability density 20%);
9 hours, if 80% < X        (with probability density 20%).

The second task has this distribution, where Y is a uniformly distributed continuous random variable between 0% and 100%, independent of X:

 4 hours, if       Y <= 20% (with probability density 20%);
 6 hours, if 20% < Y <= 60% (with probability density 40%);
 7 hours, if 60% < Y <= 80% (with probability density 20%);
13 hours, if 80% < Y        (with probability density 20%).

Now we calculate as follows:

               4@20%   6@ 40%   7@20%   13@20%
              ------ -------- ------- --------
     3@20% | 3+4@ 4% 3+6@  8% 3+7@ 4% 3+13@ 4%
     4@40% | 4+4@ 8% 4+6@ 16% 4+7@ 8% 4+13@ 8%
     5@20% | 5+4@ 4% 5+6@  8% 5+7@ 4% 5+13@ 4%
     9@20% | 9+4@ 4% 9+6@  8% 9+7@ 4% 9+13@ 4%

So here's the probability distribution and density for the sum of the two tasks' times, where Z is a uniformly distributed continuous random variable from 0% to 100%:

      7 hours, if       Z <=  4% (with probability density  4%);
      8 hours, if  4% < Z <= 12% (with probability density  8%);
      9 hours, if 12% < Z <= 24% (with probability density 12%);
     10 hours, if 24% < Z <= 44% (with probability density 20%);
     11 hours, if 44% < Z <= 60% (with probability density 16%);
     12 hours, if 60% < Z <= 64% (with probability density  4%);
     13 hours, if 64% < Z <= 68% (with probability density  4%);
     15 hours, if 68% < Z <= 76% (with probability density  8%);
     16 hours, if 76% < Z <= 84% (with probability density  8%);
     17 hours, if 84% < Z <= 92% (with probability density  8%);
     18 hours, if 92% < Z <= 96% (with probability density  4%);
     22 hours, if 96% < Z        (with probability density  4%).

All of this may be tedious, but it's logical and not hard to automate.

2.

You are correct, there is a fanning out of scenarios. Roughly, it starts with the initial certainty that before the world existed, nobody had yet done anything! After that, well, after you have automation for question 1, you could employ various strategies in your analysis. Maybe your imagination is as good as mine for this purpose. Anyway, here's what I can suggest.

You could explore what-if scenarios interactively.

You could attempt to compute and total up everything that could possibly happen. As we have seen, this kind of analysis is possible for small cases. As we can imagine, it will become intractible in large cases, such as presumably building a flight navigation system.

You could analyze the most likely scenario and perhaps a limited degree of variation around that.

Very likely, you will be interested in controlling your risks. So you could consider analyzing one or more of the following, according to your needs and convenience, all of them being a bit different from the rest: the chance of an unacceptable outcome, or the chance that unacceptable degree of uncertainty exists, or an estimate of how much uncertainty exists, or an estimate of the expected outcome (that is, the average outcome if one were to face the same situation endlessly repeated).

OTHER TIPS

Without Googling the problem, I would guess that the "unknown developer capabilities" probably pushes the problem into the NP-hard optimization problems bin. A couple of algorithms to look at are Simulated Annealing and Genetic Algorithms (Simulated Annealing was used in Wintek's electronic CAD autoplacement program (I was on the development team)).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top