Question

I have a 4-leabelled text classification problem.

Could someone help me choose among the below text classifiers ?

I was advised to select the second one ( the one which uses both unigrams and bigrams ) but I cannot really see why.

enter image description here

Was it helpful?

Solution

Okay so keeping it very short and precisely in context of your question-

Accuracy tells us, out of all the documents how many are classified correctly.

Precision tells us out of all documents which are predicted in a category, how often its correct.

Uni -gram- "nasa", "is" "space" , "agency" bi-gram- "nasa is", "space agency"

Now lets go over the numbers, in both the cases accuracy and precision doesn't have significant difference.

But as we can see bi-grams can give me much more information and hence can have better performance on unseen data. Try to test the model on unseen data/validation set and compare the difference.May be Try tri-grams etc also.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top