PorterStemmer in Lucene

https://stackoverflow.com/questions/21945600

14-10-2022
|

Question

I am looking for help on how I can use the class PorterStemFilter in Lucene 4.0. Below is my indexer taken from http://www.lucenetutorial.com/lucene-in-5-minutes.html:

...

  StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_40);
  Directory index = new RAMDirectory();
  IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40, analyzer);

  IndexWriter w = new IndexWriter(index, config);
  addDoc(w, "Lucene in Action", "193398817");
  addDoc(w, "Lucene for Dummies", "55320055Z");

......

Could someone help me with where and how to use the PorterStemFilter class

Solution

Filters are generally incorporated into an Analyzer. To create you own Analyzer, the only thing you really need to override is the TokenStream method.

If you just want to chuck a the stem filter into StandardAnalyzer, I would copy the implementation of tokenStream from StandardAnalyzer, and add the filter at the appropriate location (with stemmers, usually they should be added late in the filter chain).

@Override
public TokenStream tokenStream(String fieldName, Reader reader) {
    StandardTokenizer tokenStream = new StandardTokenizer(Version.LUCENE_46, reader);
    tokenStream.setMaxTokenLength(255);
    TokenStream result = new StandardFilter(tokenStream);
    result = new LowerCaseFilter(result);
    result = new StopFilter(true, result, StopAnalyzer.ENGLISH_STOP_WORDS_SET);
    //Adding the StemFilter here
    result = new PorterStemFilter(result);
    return result;
}

Alternatively, you could just use EnglishAnalyzer (among other languages), which already has a stemmer.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow