Question

Let's say I have documents with two fields, A and B.

I'd like to use SOLR's MoreLikeThis, but with a twist: I'm most interested in boosting documents whose A field is like my model document's B field. (That is, extract MLT's 'interesting terms' from the model B field, but only collect MLT results based on the A field.)

I don't see a way to use the mlt.fl fields or mlt.qf boosts to achieve this effect in a single query. (It seems mlt.fl specifies fields used for both discovery of 'interesting terms' and matching to those terms.) Am I missing some option?

Or will I have to extract the 'interesting terms' myself and swap the 'field:term' details?

(Other ideas in this same vein appreciated as well.)

Was it helpful?

Solution 2

I now think there are two ways to achieve the desired effect (without customizing the MLT source code).

First option: Do an initial MLT query with the MLT handler, adding the parameter &mlt.interestingTerms=details. This includes the list of terms that were deemed interesting, ranked with their relative boosts. The usual behavior uses those discovered terms against the same mlt.fl fields to find similar documents. For example, the response will include something like:

"interestingTerms": 
    ["field_b:foo",5.0,"field_b:bar",2.9085307,"field_b:baz",1.67070794]

(Since the only thing about this initial query that's interesting is the interestingTerms, throwing in an fq that rules out all docs could help it skip unnecessary scoring work.)

Explicitly re-composing that interestingTerms info into a new OR query field_a:foo^5.0 field_a:bar^2.9085307 field_a:baz^1.67070794 amounts to using the B field example text to find documents that are similar in field A, and may be mimicking exactly the kind of query default MLT does on its usual model field.

Second option: Grab the model document's actual field B text, and feed it directly as a ContentStream body, to be used in lieu of a query, for specifying the model document. Then target mlt.fl at field A for the sake of collecting similar results. For example, a fragment of the parameters might be …&stream.body=foo bar baz&mlt.fl=field_a&…. Again, the net effect being that model text originally from field_b is finding documents similar only in field_a.

OTHER TIPS

Two options I see are:

  1. Use a copyField - index your original document with a copy of field A named B, and then query using B.
  2. Extend MoreLikeThisHandler and change the fields you query.

The first option costs a bit of programming (mostly configuration changes) and some memory consumption. The second involves more programming but no memory footprint increase. Hope one of them suits your needs.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top