Question

I need a good stemming algorithm for a project I'm working on. It was suggested that I look at the Porter Stemmer. When I checked out the page on the Porter stemmer I found that it is deprecated now in favor of the "Snowball" stemmer.

I need a good stemmer, but I can't really spend significant time implementing (or optimizing) my own. What is the best "off the shelf", freely available stemmer? Are there any non-free stemmers available for a reasonable price? Or, is the Snowball stemmer my best bet?

Was it helpful?

Solution

The Porter2 stemmer is the one I've decided to go with. It seemed the porter stemmer was the standard, but when I found the page by the author he recommended the "Snowball (Porter2)" stemmer. There is a C port link on this page.

OTHER TIPS

It really depends on how you're planning to apply it. The Natural Language Toolkit (http://nltk.sourceforge.net) has a number of stemmers implemented in it that should be able to handle most applications. I prefer the Morphy stemmer.

Of course, it's available in Python, so if you're working with another language, you can always look through the code to glean the algorithm and transfer it to your language of choice. Python is highly readable.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top