solr NGramFilterFactory does not work on numbers

https://stackoverflow.com/questions/14569757

05-03-2022
|

Question

I do not know if this is a bug or feature but Solr NGramFilterFactory does not work on numbers.

Here is my field type:

<fieldType name="phone_test" class="solr.TextField" positionIncrementGap="100">
   <analyzer type="index">
      <tokenizer class="solr.LowerCaseTokenizerFactory"/>
       <filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="30" side="front" />
   </analyzer>
   <analyzer type="query">
      <tokenizer class="solr.LowerCaseTokenizerFactory"/>
   </analyzer>
</fieldType>

when I use the analyser in the Solr admin interface and type a word e.g "business" it works fine but when I write numbers e.g 12345678 it does not work.

What I want is to search for part of phone numbers. If I have 123456789 as a phone number and I search for 456 or 6789 I should get a hit.

Any ideas?

Solution

The definition for the LowerCaseFilterFactory is as follows.

Creates tokens by lowercasing all letters and dropping non-letters.

It is dropping your numbers because they are non-letters. I would recommend using the KeywordTokenizerFactory or StandardTokenizerFactory. As these should properly handle your numeric input.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow