Can not use ICUTokenizerFactory in Solr

https://stackoverflow.com/questions/14601631

06-03-2022
|

Question

I am trying to use ICUTokenizerFactory in Solr schema. This is how I have defined field and fieldType.

<fieldType name="text_icu" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
        <tokenizer class="solr.ICUTokenizerFactory"/>
    </analyzer>
</fieldType>

<field name="fld_icu" type="text_icu" indexed="true" stored="true"/>

And, when I start Solr, I am get this error

Plugin init failure for [schema.xml] fieldType "text_icu": Plugin init failure for [schema.xml] analyzer/tokenizer: Error loading class 'solr.ICUTokenizerFactory'

I have searched in for that with no success. I don't know if I am missing something or there is some problem in schema. If someone has tried ICUTokenizerFactory then please suggest what could be the problem.

Solution 2

From the Wiki:

Lucene provides support for segmenting these languages into syllables with solr.ICUTokenizerFactory in the analysis-extras contrib module. To use this tokenizer, see solr/contrib/analysis-extras/README.txt for instructions on which jars you need to add to your SOLR_HOME/lib

OTHER TIPS

Add this at the top of your solrconfig.xml:

<config>
  <lib dir="${user.dir}/../contrib/analysis-extras/lucene-libs/" />
  <lib dir="${user.dir}/../contrib/analysis-extras/lib/" />

This assumes that you are running from example directory with solr.solr.home set to your instance. Otherwise, just use absolute path to your Solr installation.

You can also copy all those jars into lib directory (under your core, not solr home). But the above is an easier way.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow