Question

Newbie here typesetting my question, so excuse me if this don't work.

I am trying to give a bayesian classifier for a multivariate classification problem where input is assumed to have multivariate normal distribution. I choose to use a discriminant function defined as log(likelihood * prior).

However, from the distribution,

$${f(x \mid\mu,\Sigma) = (2\pi)^{-Nd/2}\det(\Sigma)^{-N/2}exp[(-1/2)(x-\mu)'\Sigma^{-1}(x-\mu)]}$$

i encounter a term -log(det($S_i$)), where $S_i$ is my sample covariance matrix for a specific class i. Since my input actually represents a square image data, my $S_i$ discovers quite some correlation and resulting in det(S_i) being zero. Then my discriminant function all turn Inf, which is disastrous for me.

I know there must be a lot of things go wrong here, anyone willling to help me out?


UPDATE: Anyone can help how to get the formula working?

Was it helpful?

Solution

I do not analyze the concept, as it is not very clear to me what you are trying to accomplish here, and do not know the dataset, but regarding the problem with the covariance matrix:

The most obvious solution for data, where you need a covariance matrix and its determinant, and from numerical reasons it is not feasible is to use some kind of dimensionality reduction technique in order to capture the most informative dimensions and simply discard the rest. One such method is Principal Component Analysis (PCA), which applied to your data and truncated after for example 5-20 dimensions would yield the reduced covariance matrix with non-zero determinant.

PS. It may be a good idea to post this question on Cross Validated

OTHER TIPS

Probably you do not have enough data to infer parameters in a space of dimension d. Typically, the way you would get around this is to take an MAP estimate as opposed to an ML.

For the multivariate normal, this is a normal-inverse-wishart distribution. The MAP estimate adds the matrix parameter of inverse Wishart distribution to the ML covariance matrix estimate and, if chosen correctly, will get rid of the singularity problem.

If you are actually trying to create a classifier for normally distributed data, and not just doing an experiment, then a better way to do this would be with a discriminative method. The decision boundary for a multivariate normal is quadratic, so just use a quadratic kernel in conjunction with an SVM.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top