“Sunspot” Gem makes distinction between UTF-8 chars

https://stackoverflow.com/questions/12058721

27-06-2021
|

質問

In a Rails app I started using sunspot => https://github.com/sunspot/sunspot/blob/master/README.md

Everything went OK until I noticed this (taken from the rails-console):

1.9.3p194 :002 > MyModel.search{fulltext "leon"}.results
=> [#<MyModel id: 16, name: "Leon">]
1.9.3p194 :003 > MyModel.search{fulltext "león"}.results
=> [#<MyModel id: 18, name: "León">]

How can I tell the system not to make distinction between "leon" and "león" (I want smth like search{fulltext "leon"} => [#MyModel id: 16 ... , #MyModel id: 18...])

I've been looking for this problem and I've found every time the same response:

With this line in Gemfile works meanwhile the next release of rsolr: gem 'rsolr', :git => "https://github.com/mwmitchell/rsolr.git"

thx

解決 2

Thx for the responses. At least I've solved it right last night with anohter idea I've taked from http://codeshooter.wordpress.com/2011/01/13/full-text-search-in-in-rails-with-sunspot-and-solr/

the idea is in Restaurant.rb

text :name do 
  self.name.my_normalize
end

and the function

to_s.mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/,'').downcase

that line works with strings like "äáàÁÄÀ" --- "aaaaaa"

他のヒント

in the schema.xml you need to add a character filter as described in AnalyzersTokenizersTokenFilters for example:

<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>

and in the you should have mapping-ISOLatin1Accent.txt you should have entries that will map the unicode byte sequence to a asci character sequence. You can see an example here mapping-ISOLatin1Accent.txt

You need to make changes inside the Solr (the application, not the gem) configuration files. Solr is "embedded" in the gem, but you can access its configuration as if it were installed separately. Have a look at Solr documentation.

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow