scikit-learn GMM produce positive log probability

https://stackoverflow.com/questions/12175404

29-06-2021
|

Pregunta

I am using Gaussian Mixture Model from python scikit-learn package to train my dataset , however , I fount that when I code

-- G=mixture.GMM(...)

-- G.fit(...)

-- G.score(sum feature)

the resulting log probability is positive real number... why is that? isn't log probability guaranteed to be negative?

I get it. what Gaussian Mixture Model returns to us i the log probability "density" instead of probability "mass" so positive value is totally reasonable.

If the covariance matrix is near to singular, then the GMM will not perfomr well, and generally it means the data is not good for such generative task

Solución

Positive log probabilities are okay.

Remember that the GMM computed probability is a probability density function (PDF), so can be greater than one at any individual point.

The restriction is that the PDF must integrate to one over the data domain.

If the log probability grows very large, then the inference algorithm may have reached a degenerate solution (common with maximum likelihood estimation if you have a small dataset).

To check that the GMM algorithm has not reached a degenerate solution, you should look at the variances for each component. If any of the variances is close to zero, then this is bad. As an alternative, you should use a Bayesian model rather than maximum likelihood estimation (if you aren't doing so already).

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow