Binary Classification by using Gaussian Mixture Model

https://stackoverflow.com/questions/15401331

23-03-2022
|

Question

I want to implement the T=Log( f ( x | client) ) - Log( f ( x | impostor) ) for decision boundary. My features for training and testing are 20*12. I have applied the voicebox matlab tool box. I write the following MATLAB code :

if max(lp_client)- max(lp_impostor) >0.35
   disp('accept');
else
   disp('reject');
end

Should I used mean of Log probability or max of Log probability ?

Solution

You should use sum of lp_client because of the probability nature of the estimate. If you have a sequence of independent events (feature independence is often assumed in this model), it's probability is a product of probabilies of the each event:

P (Seq | X ) = P(feat1 | x) * P(feat2 | X) ...

Or in log domain

logP (Seq | X) = logP (feat1 | x) + logP(feat2 | X)

So actually

logP ( x | client) = sum (lp_client)

and

logP(x | impostor) = sum (lp_impostor)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow