Question

i've recently been using mongo's sharding, but i have a question:

say we have a collection which contains more that a billion documents, so in order to overcome the shortage of disk space, we shall shard thoes data, right?

so here comes the question: because none of my shard contains enough disk space to store the entire data set, how can i choose one of them as a primary shard? As all of us know that primary shard will maintain a complete data set, even if some parts of the data is on other shard?

Any one can give me some suggestion? thank ahead ;-)

Was it helpful?

Solution

The primary shard doesn't hold the complete dataset, it holds all the unsharded collections data. For the collection you are sharding, the data should be balanced between all shards (unless your sharding key choice is a poor one).

If your primary shard runs out of space because of your un-sharded data you have two options: you either shard those [un-sharded] collections also or get a bigger disk.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top