Pergunta

I have a somewhat large document and want to do stop-word elimination and stemming on the words of this document with Python. Does anyone know an of the shelf package for these? If not a code which is fast enough for large documents is also welcome. Thanks

Foi útil?

Solução

NLTK supports this.

Outras dicas

If for some reason you don't want to use NLTK, you can try PyStemmer. For stop words just download a list (google it) and filter them out.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top