Which index should I use for this distinct command in mongodb

https://stackoverflow.com//questions/9630836

09-12-2019
|

Question

I would like to get as fast as possible, the number of all distinct possible values of the field c.h (the field h of the subdocument included in the field c of the entries in the collection) which satisfy the given query : {p : [a_int], r : [a_bool]}

My first gess was to index : {p:1, r: 1, "c.h" : 1 }

Is it correct ? Will distinct use it correctly ?

I am using mongo 2.0.1

EDIT : I found on a jira ticket that you can get stats of the query. However it only works when used on replica sets (and not whhen run from a mongos in sharding). The query seems use to use at least an index on {p:1, "c.h" : 1 } correctly so I will try to use the full index.

EDIT2 : the full index works better as expected.

Solution

If explain() is not working for your distinct in a sharded environment, you can take the "brute force" approach. Use hint() to explicitly specify the index you want to test and then compare the results:

http://www.mongodb.org/display/DOCS/Optimization#Optimization-Hint

Besides removing doubt as to the index used, it also means that you are not waiting for the optimizer to try out a new query plan (it will cache the first chosen index for the query for some time).

Your shard key is going to have a big impact here, of course, and the distribution of the data based on that key will likely end up being your limiting factor (data locality winning over scatter/gather approaches).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow