Question

I am looking for a way of introducing random noise into my scoring function, and I'm at a loss on how to best proceed.

Some background:

We use Solr for a web application that manages large-ish sets of photos for agencies.

One customer has an interesting requirement for scoring:

  • 'quality' field, maintained by editors, from 1 (highest) to 3 (lowest);
  • 'date' field, boosting more recent photos; I would probably use a logarithmic function;

However, due to how the stock photo market works, this will likely result in many similar photos appearing together. Their request is to give 'quality' a large boost, but introduce some randomness so that photos will not appear in a strict date order.

Any idea?

EDITED: a key requirement is to have "stable" query results: if I search twice for "tropical island" I can get a slightly different result set, but if I ask for the first page, then the second, then the first, I'd better get the same results :)

Was it helpful?

Solution 2

Turns out my first approach to solving the problem was the correct one, and I had a trivial implementation bug. In case it helps others:

RandomSortField does have the characteristics I need (that is, returning repeatable results for the same query). Leaving aside the FunctionQuery for a moment, even something trivial like:

sort=quality_i asc, date_d desc, random_12345 desc

will approximate my requirements.

However, when using the Sunspot ruby gem, there's no way of passing the seed, and that's what was tricking me earlier: I ended up using a different seed each time, thus getting "true" random results.

OTHER TIPS

You could do this with FunctionQueries. For each photo add a field with a random number close to 1 (e.g. 0.99, 1.02) and use it in a product function query to alter the "natural" score.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top