Actually, yes, there is an analyzer that does that. SimpleAnalyzer
.
The following does (almost) exactly the same thing:
Analyzer analyzer = new Analyzer() {
@Override
protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
Tokenizer source = new LetterTokenizer(Version.LUCENE_44, reader);
TokenStream filter = new LowercaseFilter(Version.LUCENE_44, source);
return new TokenStreamComponents(source, filter);
}
};
When you have very specific requirements for an Analyzer, often you'll need to design your own by chaining a Tokenizer and some Filters like this, and as shown in the Analyzer
documentation
LetterTokenizer
defines a token as a maximal string of adjacent letters, and LowercaseFilter
does what it says on the tin.
This is a fairly common combination, so there is also LowercaseTokenizer which does the job of both LowercaseFilter and LetterTokenizer in one step, and thus provides a performance advantage. LowercaseTokenizer
is what is actually used by SimpleAnalyzer