Which index should I use for this distinct command in mongodb
-
09-12-2019 - |
Question
I would like to get as fast as possible, the number of all distinct possible values of the field c.h (the field h of the subdocument included in the field c of the entries in the collection) which satisfy the given query : {p : [a_int], r : [a_bool]}
My first gess was to index : {p:1, r: 1, "c.h" : 1 }
Is it correct ? Will distinct use it correctly ?
I am using mongo 2.0.1
EDIT : I found on a jira ticket that you can get stats of the query. However it only works when used on replica sets (and not whhen run from a mongos in sharding). The query seems use to use at least an index on {p:1, "c.h" : 1 } correctly so I will try to use the full index.
EDIT2 : the full index works better as expected.
Solution
If explain() is not working for your distinct in a sharded environment, you can take the "brute force" approach. Use hint() to explicitly specify the index you want to test and then compare the results:
http://www.mongodb.org/display/DOCS/Optimization#Optimization-Hint
Besides removing doubt as to the index used, it also means that you are not waiting for the optimizer to try out a new query plan (it will cache the first chosen index for the query for some time).
Your shard key is going to have a big impact here, of course, and the distribution of the data based on that key will likely end up being your limiting factor (data locality winning over scatter/gather approaches).