Question

I have my app working with Sunspot Solr locally, supporting unicode with no issues. In production however, with Heroku and Websolr, all unicode queries return zero results. I have confirmed with Websolr support I can query directly against their Solr system with unicode and it works fine. When I query from my production app however, they saw something like this in the log: q=أرسنا

So it doesn't seem to be related to Websolr. I also tried running the local app in production mode (pointing to Websolr), and once I do that, queries return no results again!

I'm wondering if anyone had faced similar problem, and where should I be looking for answers? I tried to set solr production log level to INFO or more to see what's being sent to Solr, but for some reason that's not showing in the server log as well.

Thanks

Was it helpful?

Solution

When Sunspot switched to use HTTP POST for its requests, it (and its dependency, RSolr) unfortunately did not specify a charset for its Content-type header. This causes Tomcat to default to ISO-8859-1 as per the servlet spec, resulting in incorrect decoding for UTF-8 characters.

A more recent version of RSolr, 1.0.7, has fixed this by specifying the correct content-type header with a UTF-8 charset. So Sunspot users who see this error should ensure that their RSolr gem dependency has been updated to 1.0.7 or greater.

OTHER TIPS

I am not sure, but may be it seems for some reason while you are making a request WebSolr may not be sending which character set to use, so your application server(I am not sure whether JBOSS or Tomcat) will think that it should use the default character set(which can be ISO-8859-1). I think it should be a bug with the product.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top