Question

When should the Monte-Carlo method be used?

For example, why did Joel decide to use the Monte-Carlo method for Evidence Based Scheduling instead of methodically processing all user data for the past year?

Was it helpful?

Solution

Suppose that you want to estimate some quantity of interest. In the Joel's example 'ship date' is what you want to estimate. In most such situations, there are random factors that impact our estimates.

When you have a random quantity, you typically wants to know its mean and the standard deviation so that you can take appropriate actions. In simple situations, you can model the quantity as a standard distribution (e.g., normal distribution) for which analytical formulas exist for the mean and the standard deviation. However, there exist many situations where analytical formulas do not exist. In such situations, instead of an analytic solution for the mean and the standard deviation, we resort to simulation. The idea is:

Step 1: Generate factors that impact the quantity of interest using appropriate distributions

Step 2: Compute quantity of interest

Repeat steps 1 and 2 many times and compute the empirical average and standard deviation for what you want to know.

The above is by far the typical application of monte carlo application. See the wikipedia link provided by Jarrod for several such applications and some examples of interesting applications where there is no inherent randomness (e.g., estimation of pi).

OTHER TIPS

Monte Carlo methods are commonly used when the dimensionality of the problem is too high for traditional schemes. A great introductory paper on the subject is Persi Diaconis' The Markov Chain Monte Carlo Revolution.

Wikipedia has a good article on monte carlo simulation methods. I've used monte carlo on a few occasions - in a nutshell MC methods tend to give accurate-ish answers when trying to project results using sample sets that are pretty much random and somebody would typically use intuition to try and guess at a trend. Unfortunately trying to explain MC methods is pretty tough so check out the article.

Because the estimates are usually pretty widely distributed when scheduling programming tasks it makes more sense to treat them statistically.

If we take a project which takes 100's of tasks the errors on the estimates will even out and you end up with a distribution which shows the likelihood of project completion as a range.

It also circumvents some serious issues like task buffering and student syndrome skewing the results even further.

Sometimes checking all the options is simply prohibitive.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top