Comparison of binary vs tfidf Ngram features in sentiment analysis / classification tasks?

https://stackoverflow.com/questions/14540630

artificial-intelligence
nlp
machine-learning
tf-idf
n-gram

05-03-2022
|

Pergunta

Simple question again: Is it better to use Ngrams (unigram/ bigrams etc) as simple binary features or rather use their Tfidf scores in ML models such as Support Vectory Machines for performing NLP tasks such as sentiment analysis or text categorization/classification?

Solução

As Steve mentioned in the comment, the best answer (and the ML-style way) is to try !

That being said, I'd start with binary features. The goal of your ML model like SVM is to determine the "weight" of these features, so if it is efficient, you don't have to try to set this weight in advance (with TFIDF or other).

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow