Pergunta

There is a ruby stemmer https://github.com/aurelian/ruby-stemmer, but it 1) does not stem English irregular verbs 2) fails to build native extensions on Windows. Is there an alternative that fixes at least one of the problems?

Foi útil?

Solução

I think you should be searching for a lemmatizer (which has information about morphology and can handle irregular words) rather than a stemmer (which usually just lops off the ends of words). See this explanation in Manning, Raghavan, and Schütze's online book on information retrieval.

I haven't tried it out, but a quick search came across this English lemmatizer for Ruby: elemma.

A commonly-used (non-Ruby) English morphological analyzer that can do lemmatization is morpha.

Outras dicas

None of the stemmers are able to handle irregular verbs in English.

i found this while googling for ruby based NLP http://mendicantbug.com/2009/09/13/nlp-resources-for-ruby/

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top