Question

I am investigating using MongoDB ReplicaSet for high availability.

But just discovered that in ReplicaSet with 3 nodes, if PRIMARY mongod is the only one left (that is 2 other mongod instances died or were shut down), then after several seconds it switches role to SECONDARY and accepts writes no more. That makes Replica Set worth less than single instance.

I know & understand about PRIMARY election, but the PRIMARY role is fixed to a server (by using priority set to ,say, 10) and (for example due to network problems) other servers become inaccessible, why the main server just gives up?!

Tested with 2.4.8 on Windows (mongodb-win32-x86_64-2008plus-2.4.8) and Linux (CentOS) and 2.0.x on Linux

BOUNTY STARTED:

If the replica set gives up when PRIMARY feels alone, what are alternative to ensure 100% availability? Or maybe there is special configuration needed for the case. The current implementation makes ReplicaSet fragile in case of network problems.

UPDATED:

Alas, I have not said before the scenario when #3 goes down (PRIMARY & SECONDARY are left) and then after a while SECONDARY goes down. Then PRIMARY really just "gives up", because it is already known that #3 is unavailable for some time. This was actually tested in my test environment.

var rsconfig = {"_id":"rs4","members":[{"_id":0,"host":"localhost:27041","priority":10},{"_id":1,"host":"localhost:27042"},{"_id":2,"host":"localhost:27043","arbiterOnly":true}]}
printjson(rsconfig)
rs.initiate(rsconfig)

We initially thought to put SECONDARY and #3 (that is ARBITER) on the same server, but because of question in title, we cannot use such configuration.

Thanks to Alan Spencer for first explaining the logic that MongoDB takes.

Was it helpful?

Solution

This is expected, since the majority of the members are down MongoDB does not assume the last remaining member is consistent.

When you have a majority of the members down there are a couple of options: http://docs.mongodb.org/manual/tutorial/reconfigure-replica-set-with-unavailable-members/

OTHER TIPS

You say that when the primary is cut off from the other two nodes it should stay up, otherwise write availability is lost, but that's not necessarily the case. If the other two nodes are actually up and on the other side of the network partition, then they have elected a new primary (as two out of three are a majority) and it is that primary that is accepting new writes.

If the previous primary continued to accept writes, you would have potentially conflicting data which there is no mechanism to resolve. Since MongoDB replica set is a single primary architecture (as opposed to a multi-master system) the election mechanism assures that there cannot be two primaries at the same time.

From the point of view of two secondaries, network partition is the same as primary being unavailable, and from the primary's point of view, network partition is indistinguishable from "both other nodes are down". It steps down, because in case of network partition there may already be another primary on the other side of it, and it assures there cannot be two primaries by stepping down.

It is not the case that the "replica set" gives up when primary feels alone - the reason primary steps down when it feels alone is precisely to preserve the integrity of the replica set as a whole. It is not true that setting high priority score fixes a role to a node - a primary can only be elected via consensus among majority - all priority scores do is influence election when all other things are equal.

I highly recommend the excellent "call me maybe" series as reading to understand the challenges of write availability in a distributed system: http://aphyr.com/posts/281-call-me-maybe-carly-rae-jepsen-and-the-perils-of-network-partitions

Just to chime in on the answers. The behavior in this scenario is expected. MongoDB uses a leader election algorithm to elect the new leader. So if there is no majority you cannot elect a leader and hence no writes.

Your only option at the point where 2 nodes are down is to reconfigure your replica set as a 1 node replica set to make it writeable. You can do this using the rs.reconfig cmd with just one server. However please note that this should just be a temporary and emergency configuration. For the longer duration you should have an odd number of total nodes (3+) in your replica set configuration.

Try to use arbiters, most documents say to use just one, but in you case, you need to win the election.

From http://docs.mongodb.org/manual/core/replica-set-architectures/ :

Fault tolerance for a replica set is the number of members that can become unavailable and still leave enough members in the set to elect a primary. In other words, it is the difference between the number of members in the set and the majority needed to elect a primary. Without a primary, a replica set cannot accept write operations. Fault tolerance is an effect of replica set size, but the relationship is not direct.

More on elections: http://docs.mongodb.org/manual/core/replica-set-elections/

More on arbiters: http://docs.mongodb.org/manual/faq/replica-sets/#how-many-arbiters-do-replica-sets-need

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top