문제

I need to do a algorithm to calculate an integral via Monte Carlo Method, and for a purpose of simulation, I need to calculate the standard deviation of a sample generated in my program. My problem is that, when I am increasing the number of elements of my sample, my standard deviation does not decay, as I should expect. First I thought that my function was wrong, but using the numpy pre defined function to calculate the standard deviation, I saw that the values were the same and it was not decreasing as I expected. So I wondered that what was wrong was my sample, so I made the following simulation to test if the standard deviation was decreasing as it should do:

list = [random.uniform(0,1) for i in range(100)]
print np.std(list)

the standard deviation obtained: 0.289

list = [random.uniform(0,1) for i in range(1000)]
print np.std(list)

the standard deviation obtained: 0.287

Shouldn't this decrease while my n increases? Because I need this to use as stopping criterion in my simulation, and I was excepcting this to decreases with a bigger sample. What is wrong with my mathematical concept?

Thanks in advance!

도움이 되었습니까?

해결책

Standard deviation of a distribution does not depend on the sample size. The standard deviation for a uniform distribution is (b - a)/sqrt(12) where a and b are the limits of your distribution. In your case, a = 0 and b = 1, so you should expect std = 1/sqrt(12) = 0.288675 for any size sample.

Perhaps what you're looking for is the standard error, which is given by std/sqrt(N) and will decrease as your sample size increases:

In [9]: sample = np.random.uniform(0, 1, 100)

In [10]: sample.std()/np.sqrt(sample.size)
Out[10]: 0.029738347511343809

In [11]: sample = np.random.uniform(0, 1, 1000)

In [12]: sample.std()/np.sqrt(sample.size)
Out[12]: 0.0091589707054713591

다른 팁

No, your mathematical concept is not flawed, standard deviation remains constant for larger n. What AHuman correctly points out is that you should avoid using reserved keywords for your variable names: list is a python reserved keyword. Use my_list or some other variable name instead.

[edit] Because the calculated mean is random, error bounds will not work; you will have to calculate the confidence interval which in this case is a probabilistic error bound. You can look here for more info: http://planetmath.org/montecarlosimulation

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top