random.expovariate(rate) and numpy.random.poisson(quantity) yield the same average value, but the distributions are vastly different. Why is this?

StackOverflow https://stackoverflow.com/questions/14620894

سؤال

I'm making some modifications to the load testing framework that we're using throughout the company, and this is a question for which I would love to have an answer.

I was under the impression that the following 2 approaches to generating a Poisson distribution would be equivalent, but I'm clearly wrong:

#!/usr/bin/env python                                                                            

from numpy import average, random, std
from random import expovariate

def main():

    for count in 5.0, 50.0:
        data = [random.poisson(count) for i in range(10000)]
        print 'npy_poisson average with count=%d: ' % count, average(data)
        print 'npy_poisson std_dev with count=%d: ' % count, std(data)

        rate = 1 / count
        data = [expovariate(rate) for i in range(10000)]
        print 'expovariate average with count=%d: ' % count, average(data)
        print 'expovariate std_dev with count=%d: ' % count, std(data)

if __name__ == '__main__':
    main()

This results in output that looks like:

npy_poisson average with count=5:   5.0168
npy_poisson std_dev with count=5:   2.23685443424
expovariate average with count=5:   4.94383067075
expovariate std_dev with count=5:   4.95058985422
npy_poisson average with count=50:  49.9584
npy_poisson std_dev with count=50:  7.07829565927
expovariate average with count=50:  50.9617389096
expovariate std_dev with count=50:  51.6823970228

Why does the standard deviation when I use the built in random.expovariate scale proportionately with number of events in a given interval, while the expovariate std_deviation scales at a rate of log base 10 (count)??

Follow up question: Which one is more appropriate if you're simulating the frequency with which users interact with your service?

هل كانت مفيدة؟

المحلول

Because your assumptions are wrong. The mean / variance of a Poisson distribution are both lambda, hence the stdev is sqrt(lambda). The mean / variance of an exponential distribution are 1/lambda and 1/lambda^2 respectively. So std = sqrt(1/(1/rate)^2) = sqrt(rate^2) = rate which is exactly what you are seeing here.

I'd suggest reading the Wikipedia article on queuing theory for your follow up question.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top