How does the Naive Bayes algorithm function effectively as a classifier, despite the assumptions of conditional indpendence and bag of words?
-
11-12-2020 - |
문제
Naive Bayes algorithm used for text classification relies on 2 assumptions to make it computationally speedy:
Bag of Words assumption: the position of words is not considered
Conditional Independence: words are independent of one another
In reality, neither of those conditions often holds, yet Naive Bayes is quite effective. Why is that?
해결책
The main reason is that in many cases (but not always) the model obtains enough evidence to make the right decision just from knowing which words appear and don't appear in the document (possibly also using their frequency, but this is not always needed either).
Let's take the textbook example of topic detection from news documents. A 'sports' article is likely to contain at least a few words which are unambiguously related to sports, and the same holds for many topic as long as the topics are sufficiently distinct.
In general tasks which are related to the general semantics of the text work reasonably well with unigrams (single words, unordered) as features, whether with NB or other methods. It's different for tasks which require taking syntax into account, or which require a deeper understanding of the semantics.