Question

Could someone give some general instructions on how one can parallelize the PyMC MCMC code. I am trying to run LASSO regression following the example given here. I read somewhere that parallel sampling is done by default, but do I still need to use something like Parallel Python to get it to work?

Here is some reference code that I would like to be able to parallelize on my machine.

x1 = norm.rvs(0, 1, size=n)
x2 = -x1 + norm.rvs(0, 10**-3, size=n)
x3 = norm.rvs(0, 1, size=n)

X = np.column_stack([x1, x2, x3])
y = 10 * x1 + 10 * x2 + 0.1 * x3

beta1_lasso = pymc.Laplace('beta1', mu=0, tau=1.0 / b)
beta2_lasso = pymc.Laplace('beta2', mu=0, tau=1.0 / b)
beta3_lasso = pymc.Laplace('beta3', mu=0, tau=1.0 / b)

@pymc.deterministic
def y_hat_lasso(beta1=beta1_lasso, beta2=beta2_lasso, beta3=beta3_lasso, x1=x1, x2=x2, x3=x3):
    return beta1 * x1 + beta2 * x2 + beta3 * x3

Y_lasso = pymc.Normal('Y', mu=y_hat_lasso, tau=1.0, value=y, observed=True)

lasso_model = pymc.Model([Y_lasso, beta1_lasso, beta2_lasso, beta3_lasso])
lasso_MCMC = pymc.MCMC(lasso_model)
lasso_MCMC.sample(20000,5000,2)
Was it helpful?

Solution

It looks like you are using PyMC2, and as far as I know, you must use some Python approach to parallel computation, like IPython.parallel. There are many ways to do this, but all the ones I know are a little bit complicated. Here is an example of one, which uses PyMC2, IPCluster, and Wakari.

In PyMC3, parallel sampling is implemented in the psample method, but your reference code will need to be updated to the PyMC3 format:

with pm.Model() as model:
    beta1 = pm.Laplace('beta1', mu=0, b=b)
    beta2 = pm.Laplace('beta2', mu=0, b=b)
    beta3 = pm.Laplace('beta3', mu=0, b=b)

    y_hat = beta1 * x1 + beta2 * x2 + beta3 * x3
    y_obs = pm.Normal('y_obs', mu=y_hat, tau=1.0, observed=y)

    trace = pm.psample(draws=20000, step=pm.Slice(), threads=3)

OTHER TIPS

PYMC3 has merged the psample into sample.

To run in parallel set the parameter njobs > 1.

The usage for the pymc.sample function is:

sample(draws, step, start=None, trace=None, chain=0, njobs=1, tune=None, progressbar=True, model=None, random_seed=None) Note if you set njobs=None, it will default to Number of CPUs - 2.

I hope this helps.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top