Question

How does mongodb handle a machine that drops out of the shard group?

The reason i am asking this, I have several small machines, and one of them likes to power cycle at random times.

so if i put a shard on this machine, will mongodb be able to handle it dropping out for a little while, then popping back up and coming online?

Is there anything I, as an administration level user will need to do in regards to mongodb?

thank you

Was it helpful?

Solution

if i put a shard on this machine, will mongodb be able to handle it dropping out for a little while, then popping back up and coming online?

Is there anything I, as an administration level user will need to do in regards to mongodb?

This depends on a number of factors, which I've tried to summarise with some context :)

Development or non-critical deployment

If this isn't a production or critical environment, you can wait for the machine to recover and no specific admin intervention should be required unless the machine has other errors due to unclean shutdown. Your application will have to handle any networking or timeout exceptions until the cluster is "normal" again, so you may want to intervene to minimise the noise.

Production deployment

Sharding alone does not provide for high availability or data redundancy so ideally you want each shard backed by a replica set. In this case a downed machine (with a single mongod instance) should have a minimal impact because a correctly considered Replica Set Deployment Architecture can automatically failover without any administrative intervention.

Assuming a more dire case where an entire shard is unavailable .. you will want to take some administrative action to minimise the impact. The MongoDB logs will be very noisy with networking errors while your shard is unavailable, and some actions such as chunk migrations will fail to/from the downed shard.

At a minimum you should Disable the Balancer until all shards are back online. There are a few other considerations below.

Impact of a "down" shard on your queries/updates through mongos

From your application's point of view, the impact of a downed shard will depend on the type of queries and updates you are doing as well as the shard key you have chosen.

  • directed queries or updates (those that are routed to a single shard) can still succeed if the shard is available. If a query/update is directed to a "down" shard, your driver will get an exception.

  • targeted queries (those that are routed to some but not all shards) can still succeed if the downed shard does not need to be contacted.

  • scatter/gather queries (those that are routed to all shards) will return an exception if one or more shards is down.

If you are okay with receiving incomplete query results from a mongos if one or more shards are down, there is a partial flag you can set on a query to allow this. Generally you would not want to use this option as your results will be inconsistent, but it can be helpful in emergency situations or ones where you can identify the impact of the downed shard based on your choice of shard key.

Impacts from your choice of shard key

One of the goals with choosing a shard key is to provide a reasonable distribution of reads & writes to suit your use case.

For example, a shard key with a large amount of randomness (such as a hashed shard key) distributes documents across all shards and maximizes write throughput. As a consequence, this also makes the impact of a downed shard generally affect all queries/updates.

You can also create a shard key with some data locality (for example, combining a coarse attribute like customer_id with a finer-grained attribute like a timestamp) in order to get better read performance with targeted queries for a given customer. With a more targeted shard key you may be able to identify a range of users affected (or unaffected) by a down shard.

Further Reading

There is much more detail on Sharding in the MongoDB manual.

OTHER TIPS

Sharding will not give you redundancy.

In mongodb, High availavility is achieved by creating replica sets. Every part of the shard SHOULD be composed by at least 3 machines forming a replica set. If one of them fails, the two other will reelect a primary, and will continue serving data. When failed machine recovers, oplog is applied, and data is recovered without losing service.

But if you don't use replica set, accesing data through mongos will fail when one of your any of your nodes goes down.

So, if you want HA, use replica sets

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top