質問

I have compared the performance of two implementations of Naive Bayes in both NLTK and Scikits (Bernoulli versions, class priors doesn't matter as I am using exactly the same amount of training examples for each class) by plotting their corresponding learning curves for my 3-class problem. X axis is training dataset size (forget about the real values), and Y is accuracy. Here is what I got.

Any reason for this difference in performance ?

役に立ちましたか?

解決

NLTK does not implement Bernoulli Naive Bayes. Instead, its NaiveBayesClassifier uses the multinomial NB decision rule together with boolean features.

While this combination of multinomial and Bernoulli NB parts is actually sometimes recommended (e.g. by Jurafsky and Manning for sentiment analysis), it usually represents the worst of both worlds and is most likely the result of a mistake.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top