Performance of Scikits NB vs NLTK NB

https://stackoverflow.com/questions/14617326

06-03-2022
|

Question

I have compared the performance of two implementations of Naive Bayes in both NLTK and Scikits (Bernoulli versions, class priors doesn't matter as I am using exactly the same amount of training examples for each class) by plotting their corresponding learning curves for my 3-class problem. X axis is training dataset size (forget about the real values), and Y is accuracy. Here is what I got.

Any reason for this difference in performance ?

Solution

NLTK does not implement Bernoulli Naive Bayes. Instead, its NaiveBayesClassifier uses the multinomial NB decision rule together with boolean features.

While this combination of multinomial and Bernoulli NB parts is actually sometimes recommended (e.g. by Jurafsky and Manning for sentiment analysis), it usually represents the worst of both worlds and is most likely the result of a mistake.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow