Question

In short, I am trying to determine a document's true document ID in method CustomScoreProvider.CustomScore which only provides a document "ID" relative to a sub-IndexReader.

More info: I am trying to boost my documents' scores by precomputed boost factors (imagine an in-memory structure that maps Lucene's document ids to boost factors). Unfortunately I cannot store the boosts in the index for a couple of reasons: boosting will not be used for all queries, plus the boost factors can change regularly and that would trigger a lot of reindexing.

Instead I'd like to boost the score at query time and thus I've been working with CustomScoreQuery/CustomScoreProvider. The boosting takes place in method CustomScoreProvider.CustomScore:

public override float CustomScore(int doc, float subQueryScore, float valSrcScore) {
   float baseScore = subQueryScore * valSrcScore;   // the default computation
   // boost -- THIS IS WHERE THE PROBLEM IS       
   float boostedScore = baseScore * MyBoostCache.GetBoostForDocId(doc); 
   return boostedScore;
}

My problem is with the doc parameter passed to CustomScore. It is not the true document id -- it is relative to the subreader used for that index segment. (The MyBoostCache class is my in-memory structure mapping Lucene's doc ids to boost factors.) If I knew the reader's docBase I could figure out the true id (id = doc + docBase).

Any thoughts on how I can determine the true id, or perhaps there's a better way to accomplish what I'm doing?

(I am aware that the id I'm trying to get is subject to change and I've already taken steps to make sure the MyBoostCache is always up to date with the latest ids.)

Was it helpful?

Solution

I was able to achieve this by passing the IndexSearcher to my CustomScoreProvider, using it to determine which of its subreaders is being used by the CustomScoreProvider, and then getting the MaxDoc for the prior subreaders from the IndexSearcher to determine the docBase.

private int DocBase { get; set; }

public MyScoreProvider(IndexReader reader, IndexSearcher searcher) {
   DocBase = GetDocBaseForIndexReader(reader, searcher);
}

private static int GetDocBaseForIndexReader(IndexReader reader, IndexSearcher searcher) {
    // get all segment readers for the searcher
    IndexReader rootReader = searcher.GetIndexReader();
    var subReaders = new List<IndexReader>();
    ReaderUtil.GatherSubReaders(subReaders, rootReader);

    // sequentially loop through the subreaders until we find the specified reader, adjusting our offset along the way
    int docBase = 0;
    for (int i = 0; i < subReaders.Count; i++)
    {
        if (subReaders[i] == reader)
            break;
        docBase += subReaders[i].MaxDoc();
    }

    return docBase;
}

public override float CustomScore(int doc, float subQueryScore, float valSrcScore) {
   float baseScore = subQueryScore * valSrcScore;
   float boostedScore = baseScore * MyBoostCache.GetBoostForDocId(doc + DocBase);
   return boostedScore;
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top