why the results of the porter stemmer algorithm that I have not in accordance with the root word that should be?

StackOverflow https://stackoverflow.com/questions/4516681

Question

i need to use porter stemmer algorithm to get stem word in my application,but when i test the algorithm which i get from http://www.tartarus.org/~martin/PorterStemmer, the result of stemming isn't give me correct stem word, eg : happy --> happi virus --> viru etc can you help me to solve it?

Was it helpful?

Solution

Quoting from your link:

2. Why is the stemmer not producing proper words?

It is often taken to be a crude error that a stemming algorithm does not leave a real word after removing the stem. But the purpose of stemming is to bring variant forms of a word together, not to map a word onto its ‘paradigm’ form.

And connected with this,

3. Why are there errors?

The question normally comes in the form, why should word X be stemmed to x1, when one would have expected it to be stemmed to x2? It is important to remember that the stemming algorithm cannot achieve perfection. On balance it will (or may) improve IR performance, but in individual cases it may sometimes make what are, or what seem to be, errors. Of course, this is a different matter from suggesting an additional rule that might be included in the stemmer to improve its performance.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top