When sharding you spread the data across different shards. The mongos process routes queries to shards it needs to get data from. As such you only need to look at the data a shard is holding. To quote from When to Use Sharding:
You should consider deploying a sharded cluster, if:
- your data set approaches or exceeds the storage capacity of a single node in your system.
- the size of your system’s active working set will soon exceed the capacity of the maximum amount of RAM for your system.
Also note that the working set != whole collection. The working set is defined as:
The collection of data that MongoDB uses regularly. This data is typically (or preferably) held in RAM.
E.g. you have 1TB of data but typically only 50GB is used/queried. That subset is preferably held in RAM.