here are some suggestions I'll make that can improve the code + inference:
I would
pm.Wishart("cov_matrix_inv",n_obs,np.linalg.inv(cov_matrix))
topm.Wishart("cov_matrix_inv",n_obs,np.eye(3) )
as it is more objective (and with 10000 data points your prior is not going to matter much anyways)mcmc = pm.MCMC()
should bemcmc = pm.MCMC(model)
mcmc.sample( 5000, 2000, 3 )
There are far to few samples here. The second half of MCMC, Monte Carlo, is strongest when there are lots of samples: I mean tens of thousands. Here you are only have 1000, thus the error caused by Monte Carlo will be quite high (the error decreases with increasing the sample size). Furthermore, the MCMC has likely not converged after 2000 burn in samples. You can check the convergence withplot
inpymc.Matplot
and callingplot(mcmc)
. I usedmcmc.sample( 25000, 15000, 1 )
and was getting better results.
I imagine the reason you used such low samples was the performance. Much of that is caused by the large number of samples: you have 10000 observations. That might be quite high for what you actually have in practice.
And remember, much of the value in Bayesian inference is being given posterior samples: taking the mean of these samples seems like a waste - think about using the samples in a Loss function (see chapter 5 in the book).