I am trying to force Solr to tokenize document on white-space, comma, :
and ;
. Something similar to what SQL Server Full Text search does. If I use text_general
field then it tokenizes on other characters as well like ('/','\','-')
, I tried using
<tokenizer class="solr.PatternTokenizerFactory" pattern="\s*,:;\s*"/>
But it doesn't tokenize it. Here is how my FieldType
looks like:
<fieldType name="text_sqlserver" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.PatternTokenizerFactory" pattern="\s*,:;\s*"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.PatternTokenizerFactory" pattern="\s*,:;\s*"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Is there anything that I am missing ? I have to search for case insensitive comparison as well.