Question

I am new to mongodb and amazon ec2.
It seems to me that mongo replicas are here to : 1/ avoid data loss and 2/ make reads and serving faster.
In Amazon they have this EbS thing. From what I understand it is a global persistent storage, like dropbox for instance.
So is there a need to have replicas if amazon abstracts away the need of it with EBS ?
Thanks in advance
Thomas

Was it helpful?

Solution

Let me clarify a couple of things.

EBS is essentially a SAN Volume if you are used to working within existing technologies. It can be attached to one instance, but it still has a limited IO throughout. Using RAID can help maximize the IO, provisioned IOPS can help you maximize the throughput.

Ideally however, with MongoDB, you want to have enough memory where indexes can be completely accessed within memory, performance drops if the disk needs to be hit.

Mongo can use Replicas, which is primarily used for failover and replication (You can send reads to a slave, but all writes need to hit the primary), and sharding which is used to split a dataset to increase performance. You will still need to do these things anyway even if you are using EBS for storage.

OTHER TIPS

Replicas are there not just for storage redundancy but also for server redundancy. What happens if your MongoDB server (which uses an EBS volume) suddenly disappears because, for example, the host on which is sits fails? You would need to do a whole bunch of stuff, like clone a new instance to replace it, attach the volume to that instance, reroute traffic to it, etc. Mongo's replica sets mean you don't have to do that. They keep working even if one of them fails, so you have basically 0 down time.

Additionally, it's one more layer of redundancy. You can only trust EBS so far - what if AWS has a bug that erases your volume or that makes it unavailable for an unacceptably long time? With replica sets you can even replicate your data across availability zones or even to a completely different cloud provider.

Replica sets also let you read from multiple nodes, so you can increase your read throughput, theoretically, after you've maxed out what the EBS connection gives you from one instance.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top