Searching for tokens does not return any results

https://stackoverflow.com/questions/23409649

13-07-2023
|

Question

I am new to SOLR and this is my first post in this list. I have been working on this problem for a couple of days. I tried everything which I found in google but it looks like I am missing something.

Here is my problem: I have a field called: DBASE_LOCAT_NM_TEXT It contains values like: CRD_PROD The goal is to be able to search this field either by putting the exact string "CRD_PROD" or part of it (tokenized by "_") like "CRD" or "PROD"

Currently: This query returns results: q=DBASE_LOCAT_NM_TEXT:CRD_PROD But this does not: q=DBASE_LOCAT_NM_TEXT:CRD I want to understand why the second query does not return any results

Here is how I configured the field:

<field name="DBASE_LOCAT_NM_TEXT" type="text_general" indexed="true" stored="true"    required="false" multiValued="false"/>

And Here is how I configured the field type :

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
  <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory"  ignoreCase="true" words="stopwords.txt"/>
     <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>

    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>

  </analyzer>

I am also using the analysis panel in the SOLR admin console. It shows this: WT CRD_PROD

WDF CRD_PROD CRD PROD CRDPROD

SF CRD_PROD CRD PROD CRDPROD

LCF crd_prod crd prod crdprod

SKMF crd_prod crd prod crdprod

RDTF crd_prod crd prod crdprod

I am not sure if it is related or not but this index was created using a Java program using Lucene interface. It used StandardAnalyzer for writing and the field was configured as tokenized, indexed and stored. Does this affect the SOLR configuration?

Can you please help me understand what I am missing and how I can debug it?

Thanks, Yetkin

Solution

So, this index is not indexed by Solr then? It was created in an entirely separate application?

In that case, your "index" analyzer has nothing to do with it, since it's never being used. Generally, you should be using the same analyzer setup at index and query time. There are exceptions, but stick to that unless you have a good reason to do otherwise. If the field was indexed using StandardAnalyzer, simply:

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/> 
</fieldtype>

However, not really sure of exactly why you aren't getting matches. Is the index built with an older version of lucene? It sounds like it is, so what version it was created with might be an issue as well.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow