Question

I'm newbie using MongoDB and I have a collection for this type of document:

{
"_id" : {
    "coordinate" : {
        "latitude" : 532144,
        "longitude" : -33333
    },
    "margin" : "N"
},
"prices" : [ 
    {
        "type" : "GAS_95",
        "price" : 1370,
        "date" : ISODate("2014-05-03T18:39:13.635Z")
    }, 
    {
        "type" : "DIESEL_A",
        "price" : 1299,
        "date" : ISODate("2014-05-03T18:39:13.635Z")
    }, 
    {
        "type" : "DIESEL_A_NEW",
        "price" : 1350,
        "date" : ISODate("2014-05-03T18:39:13.635Z")
    }, 
    {
        "type" : "GAS_98",
        "price" : 1470,
        "date" : ISODate("2014-05-03T18:39:13.635Z")
    }
]

}

I need to retrieve the prices for specific date, so then I run this query:

db.gasStation.aggregate(
{ "$unwind" : "$prices"}, 
{ "$match" : { 
    "_id" : { 
        "coordinate" : { 
            "latitude" : 532144 , 
            "longitude" : -33333} , 
            "margin" : "N"
        } , 
    "prices.date" : { 
        "$gte" : ISODate("2014-05-02T23:00:00.000Z") , 
        "$lte" : ISODate("2014-05-03T22:59:59.999Z")
    }
}

});

All works fine, I retrieve the documents but I presume that my can be improved, I tried to create an index for _id and prices.date:

db.gasStation.ensureIndex( { 
    "_id" : 1,
    "prices.date" : 1
} )

After that I try to see if the index is being used in my query with the explain option but is not using any index:

{
"stages" : [
    {
        "$cursor" : {
            "query" : {

            },
            "plan" : {
                "cursor" : "BasicCursor",
                "isMultiKey" : false,
                "scanAndOrder" : false,
                "allPlans" : [
                    {
                        "cursor" : "BasicCursor",
                        "isMultiKey" : false,
                        "scanAndOrder" : false
                    }
                ]
            }
        }
    },
    {
        "$unwind" : "$prices"
    },
    {
        "$match" : {
            "_id" : {
                "coordinate" : {
                    "latitude" : 532144,
                    "longitude" : -33333
                },
                "margin" : "N"
            },
            "prices.date" : {
                "$gte" : ISODate("2014-05-02T23:00:00Z"),
                "$lte" : ISODate("2014-05-03T22:59:59.999Z")
            }
        }
    }
],
"ok" : 1

}

is there any reason that my query is not suitable to use the index? I read on MongoDB documentation that the only pipeline that is not using indexes is $group but I'm not using that feature.

Was it helpful?

Solution

Try re-arranging your aggegration pipeline operators. For instance, this query:

db.gasStation.aggregate([
{ "$match" : {
    "_id" : {
        "coordinate" : {
            "latitude" : 532144 ,
            "longitude" : -33333} ,
            "margin" : "N"
        }
}},
{ "$unwind" : "$prices"},
{ "$match" : {
    "prices.date" : {
        "$gte" : ISODate("2014-05-02T23:00:00.000Z") ,
        "$lte" : ISODate("2014-05-03T22:59:59.999Z")
    }
}}

], {explain:true});

produces this output, which does show some index usage now:

{
    "stages" : [
        {
            "$cursor" : {
                "query" : {
                    "_id" : {
                        "coordinate" : {
                            "latitude" : 532144,
                            "longitude" : -33333
                        },
                        "margin" : "N"
                    }
                },
                "plan" : {
                    "cursor" : "IDCursor",
                    "indexBounds" : {
                        "_id" : [
                            [
                                {
                                    "coordinate" : {
                                        "latitude" : 532144,
                                        "longitude" : -33333
                                    },
                                    "margin" : "N"
                                },
                                {
                                    "coordinate" : {
                                        "latitude" : 532144,
                                        "longitude" : -33333
                                    },
                                    "margin" : "N"
                                }
                            ]
                        ]
                    }
                }
            }
        },
        {
            "$unwind" : "$prices"
        },
        {
            "$match" : {
                "prices.date" : {
                    "$gte" : ISODate("2014-05-02T23:00:00Z"),
                    "$lte" : ISODate("2014-05-03T22:59:59.999Z")
                }
            }
        }
    ],
    "ok" : 1

The point is to try to get pipeline operators like $match and $sort up front at the beginning of the pipeline to use indexes to limit how much data is accessed and passed on into the rest of the aggregation. There is more that you can do with the above example to improve performance but this should give you a good idea of how to approach it.

OTHER TIPS

Im going to quote the docs on this:

The $match and $sort pipeline operators can take advantage of an index when they occur at the beginning of the pipeline.

source: http://docs.mongodb.org/manual/core/aggregation-pipeline/#pipeline-operators-and-indexes

You don't have a $match or $sort at the beginning of the pipeline, you have the $unwind operation. Thus, indexes are useless here.

Edit - detailed explanation:

Still, it is possible to move part of the matching condition to the beginning of the pipeline so that an index will be used.

db.gasStation.aggregate([
    { "$match" : {
        "_id" : {
            "coordinate" : {
                "latitude" : 532144 ,
                "longitude" : -33333} ,
                "margin" : "N"
            }
    }},
    { "$project": { "prices"  : 1, "_id" : 0 } },
    { "$unwind" : "$prices"},
    { "$match" : {
        "prices.date" : {
            "$gte" : ISODate("2014-05-02T23:00:00.000Z") ,
            "$lte" : ISODate("2014-05-03T22:59:59.999Z")
        }
    }}  
],{explain:true});

However, here this index is unnecessary:

{"_id":1, "prices.date":1}

Why? Because the $match at the beginning of the pipeline only filters by the _id. In mongodb a document's _id is automatically indexed, and that's the index that will be used on this case.

Also, you can further optimize your query by removing unnecessary fields using the $project operator. If you don't need a field, remove it as soon as possible.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top