Solr - Wild Card Search varies with Stemming Methods

https://stackoverflow.com/questions/12094108

28-06-2021
|

Вопрос

I have 2 versions of solr working in my machine . say SolrVer1 and SolrVer2

SolrVer1 have applied , below stemming methods on field type text_en_splitting

<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" ignoreCase="true"/>
 <filter class="solr.PorterStemFilterFactory" ignoreCase="true"/>

SolrVer2 have applied , below stemming methods on field type text_en_splitting

<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>

it works almost same for regular search , but while using wild card search then wild card search does not giving results with grammatical on SolrVer1

like searching with ray* , SolrVer1 returns very less data as compared to SolrVer2. when i observed the results then i found that SolrVer1 does not return data with only ray and rays.

I don't know where i should use SnowballPorterFilterFactory and where i should use PorterStemFilterFactory . and what are the pros and cons of them?

Can anybody have idea on this behavior ??

Thanks

Решение

Need to know what the stemmers output for ray, rays.

Try stemming them at the Porter stemmer online tool: http://qaa.ath.cx/porter_js_demo.html. It outputs rai! That's the reason you don't get any matches for ray* with Porter stemmer.

And here is a tool for snowball stemmer: http://snowball.tartarus.org/demo.php. This outputs ray for ray and rays which is why you get the results.

You may want to read this for comparing the two stemmers: http://snowball.tartarus.org/texts/introduction.html

Appears like snowball was designed to address such short-comings of Porter.

Другие советы

Analyzers

On wildcard and fuzzy searches, no text analysis is performed on the search word.

As no analysis is done at query time for wilcard searches and hence the stemmers would be applied during query time.
The results would be different depending upon what the stemmers are producing.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow