Can you try building from the master branch on GitHub? There is a database backend fix there that is probably the cause of this. We almost have version 2.3 released, but until then you can build from GitHub.
PyMC: MCMC sampling with pickle database incredibly slow after initial sampling run
题
I'm tired of having to rerun long MCMC chains with PyMC and so using the chain saving features PyMC comes with sounds like a great idea. I'm using the pickle
database backend to get a feel for MCMC workflows with disk-based saves, and I'm finding that if I try to sample from a PyMC MCMC model with a pickle database twice in a row, the second sample
invocation is very slow.
from pymc import MCMC
from pymc.examples import disaster_model
dbname = 'simple.pickle'
S = MCMC(disaster_model, db='pickle', dbname=dbname)
S.sample(1e4) # <-- Runs very fast
if True:
S.sample(1e4) # <-- *very slow*
S.db.close()
The first sample
call completes almost instantly, but the second one proceeds very haltingly, taking several seconds to complete. Meanwhile I am looking at the simple.pickle file on the disk during the second call to sample
and noticing its size fluctuating rapidly, between 20 to 60 megabytes.
I expect the second (and all subsequent) sample
calls to complete in approximately the same time as the first, so that I can monitor the chain's mixing properties manually (yes, I know there's all kinds of fancier diagnostics I could be using, but that's besides the question).
What am I doing wrong?
PyMC version 2.2, Python 2.7.3, Ubuntu 12.10 64-bit.
解决方案