SOLR - Salta suggerimenti che restituiscono gli stessi documenti come ricerca originale

https://stackoverflow.com//questions/21027891

21-12-2019
|

Domanda

Ho dei suggerimenti di ricerca che lavorano abbastanza bene e mi piace di ottenere suggerimenti anche se la parola chiave originale ha restituito risultati (se abbiamo documenti con errori di sbagliatura nella nostra collezione).Tuttavia, spesso, sto ricevendo suggerimenti che restituiscono gli stessi esatti risultati.Ex.Cerco la latta della menta gialla, ottengo "Intendevi lattine gialle della menta?"

C'è un modo per rimuovere i suggerimenti che restituiscono gli stessi risultati del termine originale?

Sto usando solr 4.6.0 Ecco le informazioni da solrconfig.xml

<searchComponent name="spellcheck" class="solr.SpellCheckComponent"> <str name="queryAnalyzerFieldType">text_general</str>  <lst name="spellchecker"> <str name="name">default</str> <str name="field">spell2</str> <str name="classname">solr.DirectSolrSpellChecker</str>  <str name="distanceMeasure">internal</str>  <float name="accuracy">0.1</float>  <int name="maxEdits">2</int>  <int name="minPrefix">0</int>   <int name="maxInspections">5</int>  <int name="minQueryLength">4</int>  <float name="maxQueryFrequency">0.01</float> </lst>  <lst name="spellchecker"> <str name="name">wordbreak</str> <str name="classname">solr.WordBreakSolrSpellChecker</str> <str name="field">spell2</str> <str name="combineWords">true</str> <str name="breakWords">true</str> <int name="maxChanges">10</int> <str name="buildOnCommit">true</str> <int name="minBreakLength">3</int> </lst> </searchComponent> <requestHandler name="/spell" class="solr.SearchHandler" startup="lazy"> <lst name="defaults"> <str name="echoParams">none</str> <int name="rows">10</int> <str name="df">contents</str> <str name="defType">edismax</str> <str name="spellcheck.dictionary">default</str> <str name="spellcheck.dictionary">wordbreak</str> <str name="spellcheck">on</str> <str name="spellcheck.extendedResults">false</str> <str name="spellcheck.count">10</str> <str name="spellcheck.alternativeTermCount">25</str> <str name="spellcheck.maxResultsForSuggest">25</str> <str name="spellcheck.collate">true</str> <str name="spellcheck.maxCollationTries">10</str> <str name="spellcheck.maxCollations">5</str> <str name="spellcheck.onlyMorePopular">false</str> <str name="spellcheck.collateParam.defType">dismax</str> </lst> <arr name="last-components"> <str>spellcheck</str> </arr> </requestHandler>
.

Ecco le informazioni da schema.xml

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> <field name="spell2" type="text_general" indexed="true" stored="false" required="false" multiValued="true" />
.

Una query di esempio - http://localhost:8985/solr/(collection)/spell?q=yellow%20buttermints restituisce

<str name="collation">yellow (butter mints)</str> <str name="collation">yellow buttermint</str>
.

"Latticini gialli" e "Giallo Buttermint" restituiscono gli stessi risultati.

Soluzione

Non penso che ci sia un modo definito per garantirlo.Ma questo dovrebbe sicuramente aiutare -

Aggiungi questo filtro sia a query che dell'indice Time - EnglishminimalstemFilterFactory

https://cwiki.apache.org/conflureence/Display / SOLR / Filtro + Descrizioni # FilterDescriptions-EnglishminalstemFilter

Non sono sicuro se come funziona il lavoro sinonimofilterfactory in questo caso.Potresti provarlo senza di esso

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow