Question

TLDR;

How does various boosting types work together in django, django-haystack and solr?

I am having trouble getting the most obvious search results to appear first. If I search for caring for others and get 10 results, The object with title caring for others appears second in the results after caring for yourself.

Document Boosting

I have document boosted Category objects a factor of factor = 2.0 - ((the mptt tree level)/10) so 1.9 for root nodes, 1.8 for second level, 1.7 for third level so on and so forth. (or 190%, 180%, 170%... so on and so forth)

Field Boosting

title is boosted by boost=1.5 positive factor of 150% content is boosted by boost=.5 negative factor 50%

Term Boosting

I am currently not boosting any search terms.

My Goal

I want to get a list of results Categories and Articles (I'm ignoring Articles until I get my Category results straight). With Categories weighted higher than Articles, and titles weighted higher than content. Also, I'm trying to weight root category nodes higher than child nodes.

I feel like I'm missing a key concept somewhere.

Information

I'm using haystack's built-in search form and search view.

I'm using the following package/lib versions:

Django==1.4.1
django-haystack==1.2.7
pysolr==2.1.0-beta

My Index Class

class CategoryIndex(SearchIndex):
    """Categorization -> Category"""
    text = CharField(document=True, use_template=True, boost=.5)
    title = CharField(model_attr='title', boost=1.5)
    content = CharField(model_attr='content', boost=.5)
    autocomplete = EdgeNgramField(model_attr='title')

    def prepare_title(self, object): 
        return object.title

    def prepare(self, obj):
        data = super(CategoryIndex, self).prepare(obj)
        base_boost = 2.0
        base_boost -= (float(int(obj.level))/10)
        data['boost'] = base_boost
        return data

my search template at templates/search/categorization/category_text.txt

{{ object.title }}
{{ object.content }}

UPDATE

I noticed that when I took {{ object.content }} out of my search template, that records started appearing in the expected order. Why is this?

Was it helpful?

Solution

The Dismax Parser (additionally ExtendedDismax from SOLR 3.1 on) has been created exactly for these needs. You can configure all the fields that you want to have searched ('qf' parameter), add custom boosting to each and specify those fields where phrase hits are especially valuable (adding to the hit's score; the 'pf' parameter). You can also specify how many tokens in a search have to match (by a flexible rule pattern; the 'mm' parameter).

e.g. the config could look like this (part of a request handler config entry in solrconfig.xml - I'm not familiar how to do that with haystack, this is plain SOLR):

<str name="defType">dismax</str>
<str name="q.alt">*:*</str>
<str name="qf">text^0.5 title^1.5 content^0.5</str>
<str name="pf">text title^2 content</str>
<str name="fl">*,score</str>
<str name="mm">100%</str>
<int name="ps">100</int>

I don't know about haystack but it seems it would provide Dismax functionality: https://github.com/toastdriven/django-haystack/pull/314

See this documentation for the Dismax (it links to ExtendedDismax, as well): http://wiki.apache.org/solr/DisMaxQParserPlugin http://wiki.apache.org/solr/ExtendedDisMax

OTHER TIPS

It seems that you are just trying to be too smart here with all those boosts.

E.g. those in fields are completely needles if you are using default search view. In fact auto_query which is runned by default uses only one field to search - only this one marked as document=true. And haystack actually names this field content internally, so I would sugegst to rename it in search index to avoid any possible conflicts.

If it doesn't help (probably will not) you must create your custom search form or use simple workaround to achieve something you want, by placing field you want to boost multiple times in template:

{{ object.title }}
{{ object.title }}
{{ object.content }}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top