Solr spell check phrases

https://stackoverflow.com/questions/18646561

27-06-2022
|

Question

I'm trying to find a way to have two sets of spell checks.

One which handles spellchecking queries for common words found in the documents. And the other which handles spellchecking the query for items like author names, which can be multiple words long. I'd like it to work so that if they even get remotely close to an author's name it would display the suggestion; but suggestions for misspellings would require the word distance to be closer.

Right now I have a catch-all field for spellings, but it does a lot of tokenizing, which would break up the phrases; so I can't really use that for the phrase matching as is.

Here is the spell checking components:

<searchComponent name="spellcheck" class="solr.SpellCheckComponent" startup="lazy">
    <!-- <str name="queryAnalyzerFieldType">textSpell</str> -->
    <lst name="spellchecker">
        <str name="name">default</str>
        <str name="field">spell</str>
        <str name="classname">solr.DirectSolrSpellChecker</str>
        <str name="distanceMeasure">internal</str>
        <float name="accuracy">0.65</float>
        <int name="minPrefix">0</int>
        <int name="maxEdits">1</int>
        <int name="maxInspections">5</int>
        <int name="minQueryLength">3</int>
        <float name="maxQueryFrequency">0.0005</float>
        <float name="thresholdTokenFrequency">.001</float>

        <str name="buildOnCommit">true</str>
    </lst>

    <!-- a spellchecker that can break or combine words.  See "/spell" handler below for usage -->
    <lst name="spellchecker">
        <str name="name">wordbreak</str>
        <str name="classname">solr.WordBreakSolrSpellChecker</str>
        <str name="field">spell</str>
        <str name="combineWords">true</str>
        <str name="breakWords">true</str>
        <int name="maxChanges">1</int>

        <str name="buildOnCommit">true</str>
    </lst>
</searchComponent>

And the actual spell field:

<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" >
        <analyzer>
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
                <filter class="solr.EnglishMinimalStemFilterFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        </analyzer>
</fieldType>
<field name="spell" type="textSpell" indexed="true" stored="true" multiValued="true"/>

Solution

Spellchecking is done for separate tokens. You can add solr.ShingleFilterFactory to your analysis before solr.RemoveDuplicatesTokenFilterFactory to make phrase tokens.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow