Question

I've a use case which I'm trying to implement.. There is a replicaset(3 nodes) which is primary production cluster, I want to set up a historical cluster another replicaset(3 nodes) which replicates data from primary production replicaset.. But we periodically purge transaction data from it, I don't want purge to replicate since historical replicaset should hold entire data for reporting..

What are the ways to handle it..? Do we have selective replication in mongo or should I replicate oplog manually and skip deletes..

Thanks for suggestions!

Was it helpful?

Solution

MongoDB does not (as at 4.0) have a supported feature or tool for selective replication or syncing between two distinct deployments.

I would recommend upgrading from 3.4 to a newer version of MongoDB (ideally 4.0) and building a sync solution using the Change Streams API. You can choose which events to replicate to a historical archive or deployment based on the event type, namespace, or other criteria. Change streams are available for replica sets and sharded clusters using the WiredTiger storage engine and replication protocol version 1. MongoDB 3.6 is the first version to add the Change Streams API and includes support for watching individual collections. In MongoDB 4.0 the change stream support was extended to enable watching all non-system collection changes at a database or deployment scope.

If you are unable to upgrade from MongoDB 3.4 in the near future, you could also consider the approach of directly tailing the replication oplog. This would be a less robust solution than the Change Streams API, but there are some third party tools such as mongo-connector that may be helpful. The Change Streams API also uses the oplog but adds a supported API, only includes majority-committed operations (that won't rollback), and scales to support sharded clusters. The oplog format is internal and subject to change between releases of MongoDB.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top