Pergunta

I'm working on keyword/phrase extraction from a single document. I started by doing term frequency analysis, but this returns words like "new" which aren't very helpful. So I want to penalize the common words and phrases, for which we normally use idf (inverse document frequency). But since it's for a single document, I'm not sure how to do idf analysis.

Is it possible to use tf-idf method with pre-calculated idf values for (all?) words? And are such values available somewhere?

Nenhuma solução correta

Licenciado em: CC-BY-SA com atribuição
scroll top