Pergunta

I am trying to implement a very simple example of the law of large numbers using PyMC. The goal is to generate many sample averages of samples of different sizes. For example, in the code below, I'm taking repeatedly taking groups of 5 samples (samples_to_average = 5), calculating their mean, and then finding the 95% CI of the resulting trace.

The code below runs, but what I'd like to do is modify samples_to_average to be a list, so that I can calculate confidence intervals for a range of different sample sizes in a single pass.

import scipy.misc
import numpy as np
import pymc as mc

samples_to_average = 5 
list_of_samples = mc.DiscreteUniform("response", lower=1, upper=10, size=1000)

@mc.deterministic
def sample_average(x=list_of_samples, n=samples_to_average):
    samples = int(n)
    selected = x[0:samples] 
    total = np.sum(selected) 
    sample_average = float(total) / samples 
    return sample_average 

def getConfidenceInterval():   
    responseModel = mc.Model([samples_to_average, list_of_samples, sample_average])
    mapRes = mc.MAP(responseModel)
    mapRes.fit() 
    mcmc = mc.MCMC(responseModel)
    mcmc.sample( 10000, 5000)
    upper = np.percentile(mcmc.trace('sample_average')[:],95)
    lower = np.percentile(mcmc.trace('sample_average')[:],5)
    return (lower, upper)     


print getConfidenceInterval()

Most examples I've seen using the deterministic decorator use global stochastic variables. However, to achieve my aim, I think what I need to do is create a stochastic variable (of the correct length) in getConfidenceInterval(), and pass this to sample_average (rather than supplying sample_average using globals / default parameter).

How can a variable created in getConfidenceInterval() be passed into sample_average(), or alternatively, what is another way that I can evaluate multiple models using different values of samples_to_average? I'd like to avoid globals if possible.

Foi útil?

Solução

Before addressing your question, I would like to simplify the way sample_average is written so that it is more compact and easier to understand.

sample_average = mc.Lambda('sample_average', lambda x=list_of_samples, n=samples_to_average: np.mean(x[:n]))

Now you can generalize this to the case where samples_to_average is an array of parameters:

samples_to_average = np.arange(5, 25, 5)

sample_average = mc.Lambda('sample_average', lambda x=list_of_samples, n=samples_to_average: [np.mean(x[:t]) for t in n])

The getConfidenceInterval function would also have to be changed as shown below:

def getConfidenceInterval():
    responseModel = mc.Model([samples_to_average, list_of_samples, sample_average])
    mapRes = mc.MAP(responseModel)
    mapRes.fit()
    mcmc = mc.MCMC(responseModel)
    mcmc.sample( 10000, 5000)
    average = np.vstack((t for t in mcmc.trace('sample_average')))
    upper = np.percentile(average, 95, axis = 0)
    lower = np.percentile(average, 5, axis = 0)
    return (lower, upper)

I used vstack to aggregate the sample averages into a 2D array and then used the axis option in Numpy's percentile function to compute percentiles along each column.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top