Question

I'm trying to find the optimal setting for the filter cache of a search application. I have ~1,5 million MARC records to search within, from the Boston College library. The application I am testing can be found here. I'd like to investigate the impact of the filter cache settings on memory usage (and what the filterCache should be set to).

As a start, this seems to be a commonly used deault setting for Solr.

<filterCache
  class="solr.LRUCache"
  size="16384"
  initialSize="4096"
  autowarmCount="4096"/>

I'm trying to set up good queries for Solr Meter. Since each query will need to be different, I'm assuming a very long list of queries will be necessary, as well as a filter queries text file.

Filter queries text file:

format:Book
format:Electronic
format:Microfilm
~100 more filters 

From the solr logs I also see apparent filter queries printed like this:

fq=geographic_facet:"Great+Britain" 
  1. Thus, I'm assuming geographic_facet is a filter and not a facet?

  2. For the query filters text file, do I need the double quotes?

  3. What other parameters should I set to thoroughly stress-test the solr server (and how the filterCache settings affect memory usage and general performance)? I'm assuming Solr Meter will be the only application needed for this. Thank you.

Was it helpful?

Solution

  1. geographic_facet is a filter. Since you have fq=..., this is a filter. Field name does not really matter.
  2. Double quotes means it is exact phrase query in standard query parser. Having said that, the exact behavior would depend on the schema and analysis that is done to this field. Use the Analysis UI page https://cwiki.apache.org/confluence/display/solr/Analysis+Screen (in solr Admin application) to check out the exact behavior in your environment.
  3. While doing stress testing, typically apart from the solrmeter we should also have hardware metrics (CPU, memory) through JMX or other UNIX tools like (vmstat).
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top