Question

In synonyms.txt, I have :

you're => you are

When looking at what gives the analysis tool on "Because you're mine", it is expanded in "Because you mine are", which is fine for a fulltext search, but is a big problem for the shingles. I wondered if the expanded wasn't put at the end, but "you're Because mine" is expanded into "you because are mine", the following word is inserted in between. I also tested "Because mine you're" which is expanded into "Because mine you are".

Any idea about why this may happen?

Here's screen cap of analysis tool to make it 100% clear: screencap

Was it helpful?

Solution

query section in schema:

  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="wordlists/english-common-nouns.txt" minWordSize="5" minSubwordSize="4" maxSubwordSize="15" onlyLongestMatch="true"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <!-- this filter can remove any duplicate tokens that appear at the same position - sometimes
         possible with WordDelimiterFilter in conjuncton with stemming. -->
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>        
  </analyzer>

I just let WDF do its tokenization, you're => you re. In the synonyms.txt I defined:

you re => you are

which is not the most elegant way, but it works, i.e. stores tokens in the order you need.

screenshot to prove

OTHER TIPS

You can use Synonym-Expanding EDisMax Parser, which will add synonyms before doing text analysis: https://github.com/healthonnet/hon-lucene-synonyms

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top