It can be achived with ICUTransformFilterFactory, which will (un)transliterate the input query each time.
Here is an example, of how one can enable this functionality:
Enable icu4j amalyzers (lucene-analyzers-icu-*.jar, icu4j-*.jar):
Those libraries can be found in
contrib/analysis-extras
folder of solr distribution from official site (they also available via maven).In solrconfig.xml add something like these to enable them (there can be a single lib dir with all the jars that you need, in this example it just uses default location relative to
example/solr/collection1/conf
folder from official distribution):<lib dir="../../../contrib/analysis-extras/lib" regex=".*\.jar" /> <lib dir="../../../contrib/analysis-extras/lucene-libs" regex=".*\.jar" />
Split spell_text field analyzers into two separate list for index and query.
Add solr.ICUTransformFilterFactory as query analyzer with the following id
Any-Cyrillic; NFD; [^\p{Alnum}] Remove
:<fieldType name="spell_text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <charFilter class="solr.HTMLStripCharFilterFactory"/> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[,.;:]" replacement=" "/> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PatternReplaceFilterFactory" pattern="'s" replacement=""/> <filter class="solr.ShingleFilterFactory" maxShingleSize="2" outputUnigrams="true"/> <filter class="solr.LengthFilterFactory" min="3" max="256" /> </analyzer> <analyzer type="query"> <charFilter class="solr.HTMLStripCharFilterFactory"/> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[,.;:]" replacement=" "/> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PatternReplaceFilterFactory" pattern="'s" replacement=""/> <filter class="solr.ShingleFilterFactory" maxShingleSize="2" outputUnigrams="true"/> <filter class="solr.LengthFilterFactory" min="3" max="256" /> <filter class="solr.ICUTransformFilterFactory" id="Any-Cyrillic; NFD; [^\p{Alnum}] Remove" /> </analyzer> </fieldType>
Regarding the ICUTransformFilterFactory id - Any-Cyrillic; NFD; [^\p{Alnum}] Remove
:
- Related stackoverflow question
- Official guide
The configuration described above is working on my local machine the same way for russian transliterations and russian words