Calculating standard deviation [closed]

https://stackoverflow.com/questions/23431138

14-07-2023
|

Question

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.

This question does not appear to be about programming within the scope defined in the help center.

Closed 9 years ago.

I need to do a algorithm to calculate an integral via Monte Carlo Method, and for a purpose of simulation, I need to calculate the standard deviation of a sample generated in my program. My problem is that, when I am increasing the number of elements of my sample, my standard deviation does not decay, as I should expect. First I thought that my function was wrong, but using the numpy pre defined function to calculate the standard deviation, I saw that the values were the same and it was not decreasing as I expected. So I wondered that what was wrong was my sample, so I made the following simulation to test if the standard deviation was decreasing as it should do:

list = [random.uniform(0,1) for i in range(100)]
print np.std(list)

the standard deviation obtained: 0.289

list = [random.uniform(0,1) for i in range(1000)]
print np.std(list)

the standard deviation obtained: 0.287

Shouldn't this decrease while my n increases? Because I need this to use as stopping criterion in my simulation, and I was excepcting this to decreases with a bigger sample. What is wrong with my mathematical concept?

Thanks in advance!

Solution

Standard deviation of a distribution does not depend on the sample size. The standard deviation for a uniform distribution is (b - a)/sqrt(12) where a and b are the limits of your distribution. In your case, a = 0 and b = 1, so you should expect std = 1/sqrt(12) = 0.288675 for any size sample.

Perhaps what you're looking for is the standard error, which is given by std/sqrt(N) and will decrease as your sample size increases:

In [9]: sample = np.random.uniform(0, 1, 100)

In [10]: sample.std()/np.sqrt(sample.size)
Out[10]: 0.029738347511343809

In [11]: sample = np.random.uniform(0, 1, 1000)

In [12]: sample.std()/np.sqrt(sample.size)
Out[12]: 0.0091589707054713591

OTHER TIPS

No, your mathematical concept is not flawed, standard deviation remains constant for larger n. What AHuman correctly points out is that you should avoid using reserved keywords for your variable names: list is a python reserved keyword. Use my_list or some other variable name instead.

[edit] Because the calculated mean is random, error bounds will not work; you will have to calculate the confidence interval which in this case is a probabilistic error bound. You can look here for more info: http://planetmath.org/montecarlosimulation

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow