Question

I have a collection with the following document:

{
    "_id" : ObjectId("535e194aba863da3118cdf8a"),
    "device_id" : "1080000008",
    "others" : [ 
        {
            "mileage" : "0.0",
            "event" : "5",
            "satellite" : "8",
            "altitude" : "0",
            "heading" : "290"
        }
    ],
    "speed" : 68,
    "lat" : 1.3209,
    "lng" : 103.89139,
    "dateTime" : ISODate("2014-04-28T17:03:05.000Z"),
    "output_status" : 0,
    "street_name" : "JALAN AFIFI",
    "device_type" : "VT10",
    "__v" : 0
}

I have 2 shards, A and B, and i want shard A to contain the latest documents based on DateTime and shard B to contain documents that is later than 48 hours.

Is it possible in MongoDB? Or are there better sharding combinations that i can pick, like lat/lng? or should the shard key follow the index key?

Was it helpful?

Solution

I think the easiest way to do this would be to use tag aware sharding and in addition to the linked docs, there is a great write up on tag aware sharding to be found here. You would have one shard (or set of shards) that is tagged as "short-term" (or whatever makes sense), then another shard (or set of shards) that is tagged as "long-term".

Pick a shard key which allows you to identify ranges based on time, then have all new data tagged as "short-term". Now, all you have to do is periodically change the tag on the older ranges to move them to "long-term".

The balancer will move chunks to their appropriate tag as a priority (the only higher priority is a draining shard), so as long as you can deal with the fact that there will be a time frame where your "short-term" shards have more than 48 hours you should be fine.

The downside to this is that you will end up with "hot" chunks on your short term shard for writes - all writes for new data will always be going to a single chunk - the max chunk (this is true for any monotonically increasing shard key). If you are OK with that, and can handle your new data write volume on a single shard, then you should be fine.

Note that you do not have to use your dateTime field (remember that your shard key is immutable), you can also use the ObjectID in the _id field, because that contains a time based value too - for more information on that, see my related Q&A here.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top