Idf values of English words

https://datascience.stackexchange.com/questions/25725

nlp
tfidf

31-10-2019
|

Pergunta

I'm working on keyword/phrase extraction from a single document. I started by doing term frequency analysis, but this returns words like "new" which aren't very helpful. So I want to penalize the common words and phrases, for which we normally use idf (inverse document frequency). But since it's for a single document, I'm not sure how to do idf analysis.

Is it possible to use tf-idf method with pre-calculated idf values for (all?) words? And are such values available somewhere?

Nenhuma solução correta

Licenciado em: CC-BY-SA com atribuição

Não afiliado a datascience.stackexchange