Question

I am using the Ruby Sunspot gem with Solr 1.4.1

I have an issue around searching with a hyphen.

When I run a search for "foo bar bla" the expected result is returned.

When a hyphen is included in the search term like "foo - bar bla" no result is returned.

I have added hyphens to my stop word list and tweaked my schema.xml file in numerous ways over the last few days but to no avail.

For those with exposure to Sunspot I have my minimum word match set to 3 which is the same as setting the same mm config in the solrconfig.xml file, e.g: 3

This is how the relevant parts of my schema.xml file looks.

    <!-- *** This fieldType is used by Sunspot! *** -->
<fieldType name="string" class="solr.StrField" tokenized="true" omitNorms="true" sortMissingLast="true">
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.ASCIIFoldingFilterFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.PorterStemFilterFactory"/>
    <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
  </analyzer>
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.ASCIIFoldingFilterFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false" />
    <filter class="solr.PorterStemFilterFactory"/>
    <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="15" side="front"/>
  </analyzer>
</fieldType>

<!-- *** This fieldType is used by Sunspot! *** -->
<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.ASCIIFoldingFilterFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.PorterStemFilterFactory"/>
    <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
  </analyzer>
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.ASCIIFoldingFilterFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false" />
    <filter class="solr.PorterStemFilterFactory"/>
    <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="15" side="front"/>
  </analyzer>
</fieldType>

Any help or suggestions would be highly appreciated.

Thanks,

Was it helpful?

Solution

The hyphen character (-) is a Solr operator used to exclude results matching the word that follows the operator. I don't think adding a hyphen to the stop words list would affect that. I would suggest stripping the hyphens out before running the query through Solr. My guess is what is happening is that the result with the hyphen is excluding documents that match "bar"? Perhaps you could try faceting the results to see if that is in fact the case.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top