문제

How can we map non ASCII char with ASCII character?

Ex.: In solr index we have word contain char ñ, Ñ [LATIN CAPITAL LETTER N WITH TILDE] or normal n,N Then what filter/token we use to search with Normal N or Ñ and both mapped.

도움이 되었습니까?

해결책

Merging the answers of Solr, Special Chars, and Latin to Cyrilic char conversion

  1. Take a look at Solr's Analyzers, Tokenizers, and Token Filters which give you a good intro to the type of manipulation you're looking for.
  2. Probably the ASCIIFoldingFilterFactory does exactly what you want.

When changing an analyzer to remove the accents, keep in mind that you need to reindex. Otherwise the accented characters will stay within the index, but no user input can be created to match them.

Update

I tried using the ICUFoldingFilterFactory this works fine with those accents. If this one is tricky to set up, have a look into the SO question Can not use ICUTokenizerFactory in Solr

This analyzer

<fieldType name="spanish" class="solr.TextField">
    <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory" />
        <filter class="solr.ICUFoldingFilterFactory" />
    </analyzer>
</fieldType>

got me these analysis results, the screen-shot is taken from solr-admin

enter image description here

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top