Can you try building from the master branch on GitHub? There is a database backend fix there that is probably the cause of this. We almost have version 2.3 released, but until then you can build from GitHub.
PyMC: MCMC sampling with pickle database incredibly slow after initial sampling run
Question
I'm tired of having to rerun long MCMC chains with PyMC and so using the chain saving features PyMC comes with sounds like a great idea. I'm using the pickle
database backend to get a feel for MCMC workflows with disk-based saves, and I'm finding that if I try to sample from a PyMC MCMC model with a pickle database twice in a row, the second sample
invocation is very slow.
from pymc import MCMC
from pymc.examples import disaster_model
dbname = 'simple.pickle'
S = MCMC(disaster_model, db='pickle', dbname=dbname)
S.sample(1e4) # <-- Runs very fast
if True:
S.sample(1e4) # <-- *very slow*
S.db.close()
The first sample
call completes almost instantly, but the second one proceeds very haltingly, taking several seconds to complete. Meanwhile I am looking at the simple.pickle file on the disk during the second call to sample
and noticing its size fluctuating rapidly, between 20 to 60 megabytes.
I expect the second (and all subsequent) sample
calls to complete in approximately the same time as the first, so that I can monitor the chain's mixing properties manually (yes, I know there's all kinds of fancier diagnostics I could be using, but that's besides the question).
What am I doing wrong?
PyMC version 2.2, Python 2.7.3, Ubuntu 12.10 64-bit.
Solution