Lately i have been trying to apply facet to a field with some values having multiple words(a phrase)? I have been suggested to use shingles but am not sure if that would work as expected as the required phrase should be taken from a given list.

For example: when i apply facet to a field, i get seperate facets for 'Information' and 'Technology' whereas i want it to be a single facet like 'Information Technology'.

How to facet a particular phrase in a particular field?

EDIT: The schema for the required field looks like this:

<fieldType name="text_en_splitting_tight" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="2" outputUnigrams="true"/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>
        <!-- this filter can remove any duplicate tokens that appear at the same position - sometimes
             possible with WordDelimiterFilter in conjuncton with stemming. -->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      </fieldType>

The shingles filter doesn't work, as it shows three facets for Information technology: information, technology and information technology.

有帮助吗?

解决方案

The problem seems to be that the facet field words are being split in the index, by the analyzers. If you want to facet on fields which has potentially multiple words then we should use the analyzers which does not split the words. It can be "copy field" in solr so that your indexing process doesn't really change. For example you could have something like below.

<field name="facet_text_en_nosplit" type="string" indexed="true" stored="false" multiValued="true"/>

Use the above field in your facet query.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top