Naive Bayes vs Full Bayes model classifiers

https://datascience.stackexchange.com/questions/75824

12-12-2020
|

質問

I have a hard time to understand when Naive Bayes works better than Full Bayes.
In general, i know that naive bayes does the assumption that features are independent given the class.
However, if features indeed independent, does it mean that assuming that they are dependent yield worst result?

e.g. i got this data points for two features, each class is colored in different color.

Now my intuition is that Naive bayes will work well here, given a specific class we have two different distributions of the class and both are "unstructured".
However, i did ran Naive bayes (with normal pdf) and full bayes (with multivariate pdf) classifiers on that data (using multivariate) and got the same accuracy.

解決

There's no clear definition of "Full Bayes" as a classifier. Most "real world" non-Naive Bayesian classifiers take into account some but not all dependencies between features. That is, they make independence assumptions based on the meaning of the features.

If by "full Bayesian" you mean a joint model (as your example suggests), then one of the problems is that such a model doesn't generalize: it just describes the probabilities in the training set, and that implies that it's likely to overfit badly. This is actually why NB works quite well in most cases: yes it makes unrealistic independence assumptions, but this simplification allows the model to capture basic patterns from the data. In other words, the ability of the model to generalize comes from its excessively simplified assumptions.

Note: as far as I can tell, your example is well chosen and you should see a big difference between NB and a joint model: NB should perform no better than a random baseline while the joint model should obtain near perfect accuracy. There's probably a mistake somewhere if you don't obtain these results. But while this is a good toy example, it cannot help you understand the advantage of the NB assumptions.

ライセンス： CC-BY-SA と帰属

所属していません datascience.stackexchange