Pergunta

Simple question again: Is it better to use Ngrams (unigram/ bigrams etc) as simple binary features or rather use their Tfidf scores in ML models such as Support Vectory Machines for performing NLP tasks such as sentiment analysis or text categorization/classification?

Foi útil?

Solução

As Steve mentioned in the comment, the best answer (and the ML-style way) is to try !

That being said, I'd start with binary features. The goal of your ML model like SVM is to determine the "weight" of these features, so if it is efficient, you don't have to try to set this weight in advance (with TFIDF or other).

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top