Pergunta

I am getting quite different results when classifying text (in only two categories) with the Bernoulli Naive Bayes algorithm in NLTK and the one in scikit-learn module. Although the overall accuracy is comparable between the two (although far from identical) the difference in Type I and Type II errors is significant. In particular, the NLTK Naive Bayes classifier would give more Type I than Type II errors , while the scikit-learn -- the opposite. This 'anomaly' seem to be consistent across different features and different training samples. Is there a reason for this ? Which of the two is more trustworthy?

Foi útil?

Solução

NLTK does not implement Bernoulli Naive Bayes. It implements multinomial Naive Bayes but only allows binary features.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top