Pergunta

I have multiple indices on a collection as following. Particularly I want a query to use "gTs_1_RE_H_1_l_1", but the query is using "gTs_1" instead!

{
    "0" : {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "ns" : "week_raw_tweet_db.tweets",
        "name" : "_id_"
    },
    "1" : {
        "v" : 1,
        "key" : {
            "gTs" : 1
        },
        "ns" : "week_raw_tweet_db.tweets",
        "name" : "gTs_1",
        "expireAfterSeconds" : 604800
    },
    "2" : {
        "v" : 1,
        "key" : {
            "uN" : 1
        },
        "ns" : "week_raw_tweet_db.tweets",
        "name" : "uN_1"
    },
    "3" : {
        "v" : 1,
        "key" : {
            "gTs" : 1,
            "RE_H" : 1,
            "l" : 1
        },
        "ns" : "week_raw_tweet_db.tweets",
        "name" : "gTs_1_RE_H_1_l_1",
        "background" : 1
    }
}

Here I have a index on 'gTs' alone (A TTL Based index) and a compound index with 'gTs' and 'RE_H' as first two keys. ("gTs_1_RE_H_1_l_1")

Now, I'm trying to execute this query:

db.tweets.find( {

                    "RE_H" : NumberLong("484001755192636620"),                  
                    "gTs" : {
                        "$lte" : ISODate("2014-03-18T22:00:00Z"),
                        "$gte" : ISODate("2014-03-17T21:00:00Z")
                    }
                }).explain()

This should as per my knowledge, use "gTs_1_RE_H_1_l_1", but surprisingly it is using, "gTs_1" as mentioned by this output:

{
    "cursor" : "BtreeCursor gTs_1",
    "isMultiKey" : false,
    "n" : 46508,
    "nscannedObjects" : 365746,
    "nscanned" : 365746,
    "nscannedObjectsAllPlans" : 370493,
    "nscannedAllPlans" : 370494,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 1,
    "nChunkSkips" : 0,
    "millis" : 1509,
    "indexBounds" : {
        "gTs" : [ 
            [ 
                ISODate("2014-03-17T21:00:00.000Z"), 
                ISODate("2014-03-18T22:00:00.000Z")
            ]
        ]
    },
    "server" : "Frrole-API1:27017"
}

How ever, if I provide an hint, it does pick up right index. So, if I run following query:

db.tweets.find( {

                    "RE_H" : NumberLong("484001755192636620"),                  
                    "gTs" : {
                        "$lte" : ISODate("2014-03-18T22:00:00Z"),
                        "$gte" : ISODate("2014-03-17T21:00:00Z")
                    }
                }).hint("gTs_1_RE_H_1_l_1").explain()

I get following output:

/* 0 */
{
    "cursor" : "BtreeCursor gTs_1_RE_H_1_l_1",
    "isMultiKey" : true,
    "n" : 46508,
    "nscannedObjects" : 233224,
    "nscanned" : 233541,
    "nscannedObjectsAllPlans" : 233224,
    "nscannedAllPlans" : 233541,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 3,
    "nChunkSkips" : 0,
    "millis" : 1874,
    "indexBounds" : {
        "gTs" : [ 
            [ 
                true, 
                ISODate("2014-03-18T22:00:00.000Z")
            ]
        ],
        "RE_H" : [ 
            [ 
                NumberLong(484001755192636620), 
                NumberLong(484001755192636620)
            ]
        ],
        "l" : [ 
            [ 
                {
                    "$minElement" : 1
                }, 
                {
                    "$maxElement" : 1
                }
            ]
        ]
    },
    "server" : "Frrole-API1:27017"
}

Can someone please help me understand what's going On!

Foi útil?

Solução

As you can see from the output, the query that uses the simpler index is faster by about 300ms, that's why mongodb uses that index. MongoDB's optimization doesn't try to understand the query path and guess how fast it will be, it simply executes different queries and measures which one is fastest. Your MongoDB has learned that it's faster using the simple gTs index. It will automatically test this from time to time by running different queries in parallel.

This should as per my knowledge, use "gTs_1_RE_H_1_l_1", but surprisingly it is using, "gTs_1" as mentioned by this output:

That is not really surprising. You should review the documentation on indexing, in particular the section about sorting. While you don't request a sort here, you're using range query ($lte and its siblings), which is very similar. You will at least need to change the order of the indexes to RE_H, gTs such that the index for the equals constraint comes first, then the range query.

Outras dicas

It seems that the order of defining the indexes is also effective in selecting it, for example :

{
    "0" : {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "ns" : "week_raw_tweet_db.tweets",
        "name" : "_id_"
    },
     "1" : {
        "v" : 1,
        "key" : {
            "gTs" : 1,
            "RE_H" : 1,
            "l" : 1
        },
        "ns" : "week_raw_tweet_db.tweets",
        "name" : "gTs_1_RE_H_1_l_1",
        "background" : 1
    },
    "2" : {
        "v" : 1,
        "key" : {
            "gTs" : 1
        },
        "ns" : "week_raw_tweet_db.tweets",
        "name" : "gTs_1",
        "expireAfterSeconds" : 604800
    },
    "3" : {
        "v" : 1,
        "key" : {
            "uN" : 1
        },
        "ns" : "week_raw_tweet_db.tweets",
        "name" : "uN_1"
    }
}
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top