Applying a custom CharTokenizer in Solr4
题
I just wrote a custom CharTokenizer
, and I want to use it in my Solr server.
In Solr3, I could just extend TokenizerFactory
and return my CharTokenizer
in the create method, but TokenizerFactory
does not exist in Solr4.
So, I was noticed that I should replace TokenizerFactory
with TokenFilterFactory
, but in this case, I cannot return my custom CharTokenizer
, because the parameters don't match.
I also search for some documentation, but looks like there is nothing really useful about that out there.
So, how can I make it works?
Example:
public class MyCustomTokenizer extends CharTokenizer {
char anotherSpace = 24;
public MyCustomTokenizer(Version matchVersion, Reader in) {
super(matchVersion, in);
}
protected boolean isTokenChar(int c) {
return !Character.isWhitespace(c) && isToken((char) c);
}
private boolean isToken(char c) {
if (c == anotherSpace || c == ',') {
return false;
}
return true;
}
}
public class MyCustomTokenizerFactory extends TokenFilterFactory {
public void init(Map<String, String> args) {
super.init(args);
assureMatchVersion();
}
@Override
public TokenStream create(TokenStream input) {
// sh*t happens here
return new MyCustomTokenizer(luceneMatchVersion, input);
}
}
Thanks in advance.
解决方案
The best way to check for implementation is looking the Source code of an existing Tokenizer in Lucene.
Example :-
不隶属于 StackOverflow