Question

I have an application for which I am tasked with designing a mongo backed data storage.

The application goals are to provide the latest data ( no stale data ) with the fastest load times.

The data size is in the order of a few millions with the application being write heavy.

In choosing what the read strategy is given a 3-node replica set ( 1 primary, 1 secondary, 1 arbiter ), I came across two different strategies to determine where to source the reads from -

  • Read from the secondary to reduce load on primary. With the writeConcern = REPLICA_SAFE, thus ensuring the writes are done on both primary and the secondary. Set the read preference. to secondaryPreferred.

  • Always read from primary. but ensure the data is in primary before reading. So set writeConcern= SAFE . The read preference is default - primaryPreferred .

What are the things to be considered before choosing one of the options.

Was it helpful?

Solution

According to the documentation REPLICA_SAFE is a deprecated term and should be replaced with REPLICA_ACKNOWLEDGED. The other problem here is that the w value here appears to be 2 from this constant.

This is a problem for your configuration, as you have your Primary and only one Secondary, combined with an arbiter. In the event of a node going down, or being otherwise unreachable, with the level set as this it is looking to acknowledge all writes from 2 nodes where there will not be 2 nodes available. You can leave write operations hanging in this way.

The better case for your configuration would be MAJORITY, as no matter the number of nodes it will ensure writes to the Primary and the "majority" of the secondaries. But in your case any write concern condition involving more than the PRIMARY will block on all writes, if one of your nodes is down or unavailable, as you would have to have at least two more secondary nodes available so that there would still be a "majority" of nodes to acknowledge the write. Or drop the ARBITER and have two SECONDARY nodes.

So you will have to stick to the default w=1 where all writes are acknowledged to the PRIMARY unless you can deal with writes failing when your one SECONDARY goes down.

You can set the read preference to secondaryPreferred as long as you accept that you can ""possibly" be reading stale or not the latest representation of your data as the only real guarantee is of a write to the Primary node. The general replication considerations remain, in that the nodes should be somewhat equal in processing capability or this can lead to lag or general performance degradation as a result of your query operations.

Remember that replication is implemented for redundancy and is not a system for improving performance. If you are looking for performance then perhaps look into scaling up your system hardware or implement sharding to distribute the load.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top