Naive Bayes text classification using TextBlob: every instance predicted as negative when adding more sample size

StackOverflow https://stackoverflow.com/questions/22152533

Question

I am classifying documents as positive and negative labels using Naive Bayes model. It seems working fine for small balanced dataset size around 72 documents. But when I add more negative labeled documents, the classifier is predicting everything as negative.

I am splitting my dataset into 80% training and 20% test set. Adding more negatively labeled documents definitely makes the dataset skewed. Could it be the skewness that makes the classifier predict every test document as negative? I am using TextBlob/nltk implementation of Navive Bayes modle.

Any idea?

Was it helpful?

Solution

Yes, it could be that your data set is biasing your classifier. If there isn't a very strong signal to tell the classifier which class to choose, it would make sense for it to select the most prevalent class (negative in your case). Have you tried plotting the class distributions versus accuracy? Another thing to try is k-fold validation so that you are not by chance drawing a biased 80-20 training-test split.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top