Question

I need to do a algorithm to calculate an integral via Monte Carlo Method, and for a purpose of simulation, I need to calculate the standard deviation of a sample generated in my program. My problem is that, when I am increasing the number of elements of my sample, my standard deviation does not decay, as I should expect. First I thought that my function was wrong, but using the numpy pre defined function to calculate the standard deviation, I saw that the values were the same and it was not decreasing as I expected. So I wondered that what was wrong was my sample, so I made the following simulation to test if the standard deviation was decreasing as it should do:

list = [random.uniform(0,1) for i in range(100)]
print np.std(list)

the standard deviation obtained: 0.289

list = [random.uniform(0,1) for i in range(1000)]
print np.std(list)

the standard deviation obtained: 0.287

Shouldn't this decrease while my n increases? Because I need this to use as stopping criterion in my simulation, and I was excepcting this to decreases with a bigger sample. What is wrong with my mathematical concept?

Thanks in advance!

Was it helpful?

Solution

Standard deviation of a distribution does not depend on the sample size. The standard deviation for a uniform distribution is (b - a)/sqrt(12) where a and b are the limits of your distribution. In your case, a = 0 and b = 1, so you should expect std = 1/sqrt(12) = 0.288675 for any size sample.

Perhaps what you're looking for is the standard error, which is given by std/sqrt(N) and will decrease as your sample size increases:

In [9]: sample = np.random.uniform(0, 1, 100)

In [10]: sample.std()/np.sqrt(sample.size)
Out[10]: 0.029738347511343809

In [11]: sample = np.random.uniform(0, 1, 1000)

In [12]: sample.std()/np.sqrt(sample.size)
Out[12]: 0.0091589707054713591

OTHER TIPS

No, your mathematical concept is not flawed, standard deviation remains constant for larger n. What AHuman correctly points out is that you should avoid using reserved keywords for your variable names: list is a python reserved keyword. Use my_list or some other variable name instead.

[edit] Because the calculated mean is random, error bounds will not work; you will have to calculate the confidence interval which in this case is a probabilistic error bound. You can look here for more info: http://planetmath.org/montecarlosimulation

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top