Is 1 primary 2 secondaries and 2 arbiters correct architecture for replica set with fault tolerance 2?

StackOverflow https://stackoverflow.com/questions/21423500

  •  04-10-2022
  •  | 
  •  

Question

I need fault tolerance 2 and I don't want use extra server for that. I plan use following configuration for each replica sets. Each mongodbX-X is different server.

DC1 (main):
mongodb0-0 - primary
mongodb0-1 - secondary

DC2:
mongodb0-2 - secondary, priority 0.5
mongodb0-3 - arbiter
mongodb0-4 - arbiter

Is it correct? I can't find any discussion about same configuration.

Was it helpful?

Solution

It is true that failure of up to 2 members should still leave a voting majority to elect a primary in a 5-member replica set, however there are a few other factors to consider.

Write Concern

The number of data-bearing members available in your replica also affects the Write Concerns that can be acknowledged.

For example, with your configuration of two arbiters any data-bearing member being down means that a w:majority write concern cannot be satisfied. There may be administrative reasons for a member to be offline (for example, if you are running a repair or perhaps building indexes on a secondary) so the highest write concern you can safely use without impacting your application availability is going to be w:2.

If for some reason two data-bearing members fail, you will only be able to satisfy a single node acknowledged write. There is no longer any replication until at least one other data-bearing member rejoins the replica set.

Network connectivity

You may have no issues with your actual replica set members, but if connectivity between DC1 and DC2 is lost only DC2 has enough voting members to elect a primary. The danger here is that DC2 only has a single secondary in your configuration, so at this point you technically have no members down but a single member failure (of the only data-bearing member in DC2) can cause data loss.

Suggested configuration

Since the arbiters aren't under any write load, the data-bearing members are the ones most likely to fail or require maintenance. Adding two arbiters gives you the appearance of more fault tolerance, but with the caveats as noted above.

A more robust recommendation would be to have:

 DC1: primary, secondary  (priority: 2)
 DC2: secondary, secondary  (default priority)
 DC3: arbiter

In this configuration any two data-bearing members can fail or you can lose connection for a whole data centre, and still have a primary as well as ongoing replication. The higher priority on the DC1 members prefers those as candidates for the replica primary (assuming they are available and up to date).

If you don't want to have three secondaries, a more minimal configuration which gives you similar failover benefits (although only a single node of fault tolerance) would be:

DC1: primary (priority 3)
DC2: secondary (priority 2)
DC3: secondary (default priority)

This allows for failover to either DC1 or DC2 with continued replication, and also maintains the semantics of majority write concern matching a majority of the replica set members being available.

OTHER TIPS

Yes it is.

Fault tolerance 2 means that two of your servers can go offline without making the election of a new primary impossible.

Since you have:

  • 1 primary
  • 2 secondaries
  • 2 arbitrers

it means that if 2 servers go off, there will be three remaining to elect a primary, and at least one of them will not be an arbitrer.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top