Continuing a Query (paginating) on a compound index
-
21-12-2019 - |
Question
I have a (hopefully quick) question about MongoDB queries on compound indexes.
Say I have a data set (for example, comments) which I want to sort descending by score, and then date:
{ "score" : 10, "date" : ISODate("2014-02-24T00:00:00.000Z"), ...}
{ "score" : 10, "date" : ISODate("2014-02-18T00:00:00.000Z"), ...}
{ "score" : 10, "date" : ISODate("2014-02-12T00:00:00.000Z"), ...}
{ "score" : 9, "date" : ISODate("2014-02-22T00:00:00.000Z"), ...}
{ "score" : 9, "date" : ISODate("2014-02-16T00:00:00.000Z"), ...}
...
My understanding thus far is that I can make a compound index to support this query, which looks like {"score":-1,"date":-1}
. (For clarity's sake, I am not using a date in the index, but an ObjectID for unique, roughly time-based order)
Now, say I want to support paging through the comments. The first page is easy enough, I can just stick a .limit(n)
option on the end of the cursor. What I'm struggling with is continuing the search.
I have been referring to MongoDB: The Definitive Guide by Kristina Chodorow. In this book, Kristina mentions that using skip() on large datasets is not very performant, and recommends using range queries on parameters from the last seen result (eg. the last seen date).
What I would like to do is perform a range query that acts on two fields, but treats the second field as secondary to the first (just like the index is sorted.) Since my compound index is already sorted in exactly the order I want, it seems like there should be some way to jump into the search by pointing at a specific element in the index and traversing it in the sort order. However, from my (admittedly rudimentary) understanding of queries in MongoDB this doesn't seem possible.
As far as I can see, I have three options:
- Using
skip()
anyway - Using either an $or query or two distinct queries:
{$or : [{"score" : lastScore, "date" : { $lt : lastDate}}, {'score' : {$lt : lastScore}]}
- Using the
$max
special query option
Number 3 seems like the closest to ideal for me, but the reference text notes that 'you should generally use "$lt" instead of "$max"'.
To summarize, I have a few questions:
- Is there some way to perform the operation I described, that I may have missed? (Jumping into an index and traversing it in the sort order)
- If not, of the three options I described (or any I have overlooked), which would (very generally speaking) give the most consistent performance under the compound index?
- Why is $lt preferred over $max in most cases?
Thanks in advance for your help!
Solution
Another option is to store score
and date
in a sub-document and then index the sub-document. For example:
{
"a" : { "score" : 9,
"date" : ISODate("2014-02-22T00:00:00Z") },
...
}
db.foo.ensureIndex( { a : 1 } )
db.foo.find( { a : { $lt : { score : lastScore,
date: lastDate } } } ).sort( { a : -1 } )
With this approach you need to ensure that the fields in the BSON sub-document are always stored in the same order, otherwise the query won't match what you expect since index key comparison is binary comparison of the entire BSON sub-document.
I would go with using $max
to specify the upper bound, in conjunction with $hint
to make sure that the database uses the index you want. The reason that $lt
is in general preferred over $max
is because $max
selects the index using the specified index bounds. This means:
- the index chosen may not necessarily be the best choice.
- if multiple indexes exist on same fields with different sort orders, the selection of the index may be ambiguous.
The above points are covered in further detail here.
One last point: max
is equivalent to $lte
, not $lt
, so using this approach for pagination you'll need to skip over the first returned document to avoid outputting the same document twice.