Question

Reading the MongoDB documentation for indexes, i was left a little mystified and unsettled by this assertion found at: http://docs.mongodb.org/manual/applications/indexes/#ensure-indexes-fit-ram

If you have and use multiple collections, you must consider the size of all indexes on all collections. The indexes and the working set must be able to fit in RAM at the same time.

So, how is this supposed to scale when new nodes in the shard are added? suppose all my 576 nodes are bounded at 8Gb, and i have 12 collections of 4Gb each (including their associated indices) and 3 collections of 16Gb (including indices). How does the sharding spread work between nodes so that the 12 collections can be queried efficiently?

Was it helpful?

Solution

When sharding you spread the data across different shards. The mongos process routes queries to shards it needs to get data from. As such you only need to look at the data a shard is holding. To quote from When to Use Sharding:

You should consider deploying a sharded cluster, if:

  • your data set approaches or exceeds the storage capacity of a single node in your system.
  • the size of your system’s active working set will soon exceed the capacity of the maximum amount of RAM for your system.

Also note that the working set != whole collection. The working set is defined as:

The collection of data that MongoDB uses regularly. This data is typically (or preferably) held in RAM.

E.g. you have 1TB of data but typically only 50GB is used/queried. That subset is preferably held in RAM.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top