Question

I have a (hopefully quick) question about MongoDB queries on compound indexes.

Say I have a data set (for example, comments) which I want to sort descending by score, and then date:

{ "score" : 10, "date" : ISODate("2014-02-24T00:00:00.000Z"), ...}
{ "score" : 10, "date" : ISODate("2014-02-18T00:00:00.000Z"), ...}
{ "score" : 10, "date" : ISODate("2014-02-12T00:00:00.000Z"), ...}
{ "score" : 9, "date" : ISODate("2014-02-22T00:00:00.000Z"), ...}
{ "score" : 9, "date" : ISODate("2014-02-16T00:00:00.000Z"), ...}
...

My understanding thus far is that I can make a compound index to support this query, which looks like {"score":-1,"date":-1}. (For clarity's sake, I am not using a date in the index, but an ObjectID for unique, roughly time-based order)

Now, say I want to support paging through the comments. The first page is easy enough, I can just stick a .limit(n) option on the end of the cursor. What I'm struggling with is continuing the search.

I have been referring to MongoDB: The Definitive Guide by Kristina Chodorow. In this book, Kristina mentions that using skip() on large datasets is not very performant, and recommends using range queries on parameters from the last seen result (eg. the last seen date).

What I would like to do is perform a range query that acts on two fields, but treats the second field as secondary to the first (just like the index is sorted.) Since my compound index is already sorted in exactly the order I want, it seems like there should be some way to jump into the search by pointing at a specific element in the index and traversing it in the sort order. However, from my (admittedly rudimentary) understanding of queries in MongoDB this doesn't seem possible.

As far as I can see, I have three options:

  1. Using skip() anyway
  2. Using either an $or query or two distinct queries: {$or : [{"score" : lastScore, "date" : { $lt : lastDate}}, {'score' : {$lt : lastScore}]}
  3. Using the $max special query option

Number 3 seems like the closest to ideal for me, but the reference text notes that 'you should generally use "$lt" instead of "$max"'.

To summarize, I have a few questions:

  1. Is there some way to perform the operation I described, that I may have missed? (Jumping into an index and traversing it in the sort order)
  2. If not, of the three options I described (or any I have overlooked), which would (very generally speaking) give the most consistent performance under the compound index?
  3. Why is $lt preferred over $max in most cases?

Thanks in advance for your help!

Was it helpful?

Solution

Another option is to store score and date in a sub-document and then index the sub-document. For example:

{
  "a" : { "score" : 9,
          "date" : ISODate("2014-02-22T00:00:00Z") },
  ...
}

db.foo.ensureIndex( { a : 1 } )

db.foo.find( { a : { $lt : { score : lastScore,
                             date: lastDate } } } ).sort( { a : -1 } )

With this approach you need to ensure that the fields in the BSON sub-document are always stored in the same order, otherwise the query won't match what you expect since index key comparison is binary comparison of the entire BSON sub-document.

I would go with using $max to specify the upper bound, in conjunction with $hint to make sure that the database uses the index you want. The reason that $lt is in general preferred over $max is because $max selects the index using the specified index bounds. This means:

  • the index chosen may not necessarily be the best choice.
  • if multiple indexes exist on same fields with different sort orders, the selection of the index may be ambiguous.

The above points are covered in further detail here.

One last point: max is equivalent to $lte, not $lt, so using this approach for pagination you'll need to skip over the first returned document to avoid outputting the same document twice.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top