Deterministic variables and a Fortran Scipy function in PyMC 3

https://stackoverflow.com/questions/22799362

25-06-2023
|

Frage

I am trying to build a simple PyMC 3 model in which I estimate two cut points and a correlation parameter in a latent bivariate Gaussian density, producing four predicted probabilities for a vector of (multinomial) counts. (This will, I hope, eventually be part of a larger model in which these and other parameters are estimated for a number of latent multivariate Gaussian densities.)

So, I want to model the cut points cx and cy as normal random variables and the correlation parameter rho as a scaled Beta random variable (as a side note, I'd love to hear a better way to deal with rho - does PyMC 3 have truncated normal random variables, for example?). And I want to use the function mvnun to calculate the predicted probabilities for given values of cx, cy, and rho. The function mvnun is part of scipy.stats.mvn, which is a bit of compiled Fortran code that has two functions for calculating very accurate multivariate normal CDF values.

If I try to stick rho in a correlation matrix S or if I put cx or cy into arrays indicating the integration limits, I get this:

ValueError: setting an array element with a sequence.

If I use fixed numerical values for cx, cy, and/or rho, mvnun works just fine (in or out of the 'with model:' block). I've been poking around, trying to figure out why the PyMC RVs cause this error, but I'm stumped. I gather that cx, cy, and rho are theano TensorVariables, but I can't figure out what, if anything, about theano TensorVariables would cause these problems.

Is there a fundamental problem with trying to call a Fortran function with PyMC RVs as arguments? Or is my code flawed in some way? Both? Something else entirely?

I'm new to PyMC, and I installed PyMC 3 thinking it was the current version (I swear the note about it being the alpha version wasn't there when I installed it a few weeks ago). Perhaps I should install 2.3 and figure out how to put this together with that?

In any case, any advice about how to fix things would be much appreciated.

Here's my code:

import pymc as pm
import numpy as np
from scipy.stats.mvn import mvnun as mvncdf

counts = np.array([100,35,45,20])
N = np.sum(counts)

def scaleBeta(x):
    return x*1.98 - 0.99

model = pm.Model()

with model:

    cx = pm.Normal('Cx', mu=0., tau=1.)
    cy = pm.Normal('Cy', mu=0., tau=1.)

    mu = np.array([0., 0.])

    rho_beta = pm.Beta('rho_beta', alpha=2, beta=2)
    rho = pm.Deterministic('rho',scaleBeta(rho_beta))
    S = np.array([1, rho, rho, 1]).reshape(2,2)

    low_aa = np.array([-50.,-50.])
    upp_aa = np.array([cx, cy])
    low_ab = np.array([-50., cy])
    upp_ab = np.array([cx, 50.])
    low_ba = np.array([cx, -50.])
    upp_ba = np.array([50., cy])
    low_bb = np.array([cx, cy])
    upp_bb = np.array([50., 50.])

    p_aa,i = mvncdf(lower=low_aa,upper=upp_aa,means=mu,covar=S)
    p_ab,i = mvncdf(lower=low_ab,upper=upp_ab,means=mu,covar=S)
    p_ba,i = mvncdf(lower=low_ba,upper=upp_ba,means=mu,covar=S)
    p_bb,i = mvncdf(lower=low_bb,upper=upp_bb,means=mu,covar=S)

    pv = np.array([p_aa,p_ab,p_ba,p_bb])

    cv = pm.Multinomial('counts',n=N,p=pv,observed=counts)

Lösung

This will be somewhat easier in PyMC 2.3, so installing 2.3 is a pretty good option.

In PyMC 3 that doesn't work because cx, cy, and rho are theano TensorVariables and mvncdf expects numpy arrays. Theano variables are not a kind of numpy array. Instead they represent a computation leading to a numpy array.

You'll need to make mvncdf into a Theano operator (theano analog of a numpy function). There unfortunately isn't a super simple way to do that right now, though it is coming. You can also take a look at the guide to making a Theano Op.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow