Idf values of English words
Pergunta
I'm working on keyword/phrase extraction from a single document. I started by doing term frequency analysis, but this returns words like "new" which aren't very helpful. So I want to penalize the common words and phrases, for which we normally use idf (inverse document frequency). But since it's for a single document, I'm not sure how to do idf analysis.
Is it possible to use tf-idf method with pre-calculated idf values for (all?) words? And are such values available somewhere?
Nenhuma solução correta
Licenciado em: CC-BY-SA com atribuição
Não afiliado a datascience.stackexchange