PyMC: sampling step by step?

https://stackoverflow.com/questions/22581710

mcmc
pymc

19-06-2023
|

Pregunta

I would like to know why the sampler is incredibly slow when sampling step by step. For example, if I run:

mcmc = MCMC(model)
mcmc.sample(1000)

the sampling is fast. However, if I run:

mcmc = MCMC(model)
for i in arange(1000):
    mcmc.sample(1)

the sampling is slower (and the more it samples, the slower it is).

If you are wondering why I am asking this.. well, I need a step by step sampling because I want to perform some operations on the values of the variables after each step of the sampler.

Is there a way to speed it up?

Thank you in advance!

------------------ EDIT -------------------------------------------------------------

Here I present the specific problem in more details:

I have two models in competition and they are part of a bigger model that has a categorical variable functioning as a 'switch' between the two.

In this toy example, I have the observed vector 'Y', that could be explained by a Poisson or a Geometric distribution. The Categorical variable 'switch_model' selects the Geometric model when = 0 and the Poisson model when =1.

After each sample, if switch_model selects the Geometric model, I want the variables of the Poisson model NOT to be updated, because they are not influencing the likelihood and therefore they are just drifting away. The opposite is true if the switch_model selects the Poisson model.

Basically what I do at each step is to 'change' the value of the non-selected model by bringing it manually one step back.

I hope that my explanation and the commented code will be clear enough. Let me know if you need further details.

import numpy as np
import pymc as pm
import pandas as pd
import matplotlib.pyplot as plt

# OBSERVED VALUES
Y = np.array([0, 1, 2, 3, 8])

# PRIOR ON THE MODELS
pi = (0.5, 0.5)

switch_model = pm.Categorical("switch_model", p = pi) 
# switch_model = 0 for Geometric, switch_model = 1 for Poisson

p = pm.Uniform('p', lower = 0, upper = 1) # Prior of the parameter of the geometric distribution
mu = pm.Uniform('mu', lower = 0, upper = 10) # Prior of the parameter of the Poisson distribution


# LIKELIHOOD
@pm.observed
def Ylike(value = Y, mu = mu, p = p, M = switch_model):
    if M == 0:
        out = pm.geometric_like(value+1, p)
    elif M == 1:
        out = pm.poisson_like(value, mu)
    return out

model = pm.Model([Ylike, p, mu, switch_model])

mcmc = pm.MCMC(model)

n_samples = 5000

traces = {}
for var in mcmc.stochastics:
    traces[str(var)] = np.zeros(n_samples)


bar = pm.progressbar.progress_bar(n_samples)
bar.update(0)

mcmc.sample(1, progress_bar=False)
for var in mcmc.stochastics:
    traces[str(var)][0] = mcmc.trace(var)[-1]


for i in np.arange(1,n_samples):
    mcmc.sample(1, progress_bar=False)
    bar.update(i)
    for var in mcmc.stochastics:
        traces[str(var)][i] = mcmc.trace(var)[-1]
    if mcmc.trace('switch_model')[-1] == 0: # Gemetric wins
        traces['mu'][i] = traces['mu'][i-1] # One step back for the sampler of the Poisson parameter
        mu.value = traces['mu'][i-1]

    elif mcmc.trace('switch_model')[-1] == 1: # Poisson wins
        traces['p'][i] = traces['p'][i-1] # One step back for the sampler of the Geometric parameter
        p.value = traces['p'][i-1]

print '\n\n'

traces=pd.DataFrame(traces)

traces['mu'][traces['switch_model'] == 0] = np.nan
traces['p'][traces['switch_model'] == 1] = np.nan

print traces.describe()

traces.plot()
plt.show()

Solución 2

Actually I found a 'crazy' solution, and I have the suspect to know why it works. I would still like to get an expert opinion on my trick.

Basically if I modify the for loop in the following way, adding a 'reset of the mcmc' every 1000 loops, the sampling fires up again:

for i in np.arange(1,n_samples):
    mcmc.sample(1, progress_bar=False)
    bar.update(i)
    for var in mcmc.stochastics:
        traces[str(var)][i] = mcmc.trace(var)[-1]
    if mcmc.trace('switch_model')[-1] == 0: # Gemetric wins
        traces['mu'][i] = traces['mu'][i-1] # One step back for the sampler of the Poisson parameter
        mu.value = traces['mu'][i-1]

    elif mcmc.trace('switch_model')[-1] == 1: # Poisson wins
        traces['p'][i] = traces['p'][i-1] # One step back for the sampler of the Geometric parameter
        p.value = traces['p'][i-1]

    if i%1000 == 0:
        mcmc = pm.MCMC(model)

In practice this trick erases the traces and the database of the sampler every 1000 steps. It looks like the sampler does not like having a long database, although I do not really understand why. (of course 1000 steps is arbitrary, too short it adds too much overhead, too long it will cause the traces and database to be too long).

I find this hack a bit crazy and definitely not elegant.. does any of the experts or developers have a comment on it? Thank you!

Otros consejos

The reason this is so slow is that Python's for loops are pretty slow, especially when they are compared to FORTRAN loops (Which is what PyMC is written in basically.) If you could show more detailed code, it might be easier to see what you are trying to do and to provide faster alternative algorithms.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow