Naive Bayes text classification using TextBlob: every instance predicted as negative when adding more sample size

https://stackoverflow.com/questions/22152533

19-10-2022
|

Question

I am classifying documents as positive and negative labels using Naive Bayes model. It seems working fine for small balanced dataset size around 72 documents. But when I add more negative labeled documents, the classifier is predicting everything as negative.

I am splitting my dataset into 80% training and 20% test set. Adding more negatively labeled documents definitely makes the dataset skewed. Could it be the skewness that makes the classifier predict every test document as negative? I am using TextBlob/nltk implementation of Navive Bayes modle.

Any idea?

Solution

Yes, it could be that your data set is biasing your classifier. If there isn't a very strong signal to tell the classifier which class to choose, it would make sense for it to select the most prevalent class (negative in your case). Have you tried plotting the class distributions versus accuracy? Another thing to try is k-fold validation so that you are not by chance drawing a biased 80-20 training-test split.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow