PyMC: MCMC sampling with pickle database incredibly slow after initial sampling run

https://stackoverflow.com/questions/19203029

30-06-2022
|

Question

I'm tired of having to rerun long MCMC chains with PyMC and so using the chain saving features PyMC comes with sounds like a great idea. I'm using the pickle database backend to get a feel for MCMC workflows with disk-based saves, and I'm finding that if I try to sample from a PyMC MCMC model with a pickle database twice in a row, the second sample invocation is very slow.

from pymc import MCMC
from pymc.examples import disaster_model

dbname = 'simple.pickle'

S = MCMC(disaster_model, db='pickle', dbname=dbname)

S.sample(1e4) # <-- Runs very fast
if True:
    S.sample(1e4) # <-- *very slow*

S.db.close()

The first sample call completes almost instantly, but the second one proceeds very haltingly, taking several seconds to complete. Meanwhile I am looking at the simple.pickle file on the disk during the second call to sample and noticing its size fluctuating rapidly, between 20 to 60 megabytes.

I expect the second (and all subsequent) sample calls to complete in approximately the same time as the first, so that I can monitor the chain's mixing properties manually (yes, I know there's all kinds of fancier diagnostics I could be using, but that's besides the question).

What am I doing wrong?

PyMC version 2.2, Python 2.7.3, Ubuntu 12.10 64-bit.

Solution

Can you try building from the master branch on GitHub? There is a database backend fix there that is probably the cause of this. We almost have version 2.3 released, but until then you can build from GitHub.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow