Pregunta

I'm trying to apply more than one filter on the TokenStream in my customized analyzer. Following is the code:

public class CustomizeAnalyzer extends Analyzer {
//code omitted

@Override
protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
    Tokenizer source = new LetterTokenizer(Version.LUCENE_44, reader);              
    TokenStream filter = new LowerCaseFilter(Version.LUCENE_44, source);                
    filter = new StopFilter(Version.LUCENE_44, filter, stopWords);                  
    return new TokenStreamComponents(source, new PorterStemFilter(source));
}                                              
}

However, the LowerCaseFilter won't be used. I literally follow the documentation here. Can someone please explain me how to make it work?

Many thanks,

¿Fue útil?

Solución

Your problem is in the last line. You create a chain of filters, and then short circuit it in the return statement by passing back new PorterStemFilter(source), which is a stem filter sitting directly on the tokenizer, rather than the filters earlier in the chain. This should be:

@Override
protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
    Tokenizer source = new LetterTokenizer(Version.LUCENE_44, reader);              
    TokenStream filter = new LowerCaseFilter(Version.LUCENE_44, source);                
    filter = new StopFilter(Version.LUCENE_44, filter, stopWords);                  
    filter = new PorterStemFilter(filter);
    return new TokenStreamComponents(source, filter);
} 
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top