Question

Simple question again: Is it better to use Ngrams (unigram/ bigrams etc) as simple binary features or rather use their Tfidf scores in ML models such as Support Vectory Machines for performing NLP tasks such as sentiment analysis or text categorization/classification?

Était-ce utile?

La solution

As Steve mentioned in the comment, the best answer (and the ML-style way) is to try !

That being said, I'd start with binary features. The goal of your ML model like SVM is to determine the "weight" of these features, so if it is efficient, you don't have to try to set this weight in advance (with TFIDF or other).

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top