Domanda

Q: Which is the best architecture for life-copies for testing and development?

Current setup:

We have two amazon/EC2 mongod servers like this:

Machine A:
    A production database (on an amazon/EC2 server) (name it ‘PROD’)
    Other databases (‘OTHER’)

Machine B:
    a pre-production database (name it ‘PRE’)
    a copy for developer 1 own tests (call it ‘DEVEL-1’)
    a copy for developer 2 (DEVEL-2)
    …DEVEL-n

The PRE database is for integration-tests before deploying into production.

The DEVEL-n is for each developer trashing its own data without annoying the other developers.

From time to time we want to “restore” fresh data from PROD into the PRE and DEVEL-n bases.

Currently we pass from PROD to PRE via the .copyDatabase() command. Then we issue .copyDatabase() “n” times to make copies from PRE into DEVEL-n.

The trouble:

A copy takes soooo long (1hour per copy, DBsize over 10GB) and also normally it saturates the mongod so we have to restart the service.

We have found about:

  • Dump/restore system (saturates as .copyDatabase() does)
  • Replica sets
  • Master/Slave (seem deprecated)

Replica-sets seem the winners, but we have serious doubts:

Suppose we want a replica-set to sync live A/PROD into B/PRE (and have A likely as a primary and B likely as secondary):

a) Can I select “a few” databases from A to replicate PROD but leave OTHER alone?

b) Can I have “extra” databases in B (like DEVEL-n) which are not in the master?

c) Can I “stop to replicate” so we can deploy to PRE, test the soft with fresh-data, trash the data with the testing and after tests have been complete “re-link” the replica so changes in PRE are deleted and changes in PROD are transported into PRE adequately?

d) Is there any other better way than replica-sets suitable for this case?

Thanks. Marina and Xavi.

È stato utile?

Soluzione

Replica-sets seem the winners, but we have serious doubts:

Suppose we want a replica-set to sync live A/PROD into B/PRE (and have A likely as a primary and B likely as secondary):

a) Can I select “a few” databases from A to replicate PROD but leave OTHER alone?

As at MongoDB 2.4, replication always includes all databases. The design intent is for all nodes to be eventually consistent replicas, so that you can failover to another non-hidden secondary in the same replica set.

b) Can I have “extra” databases in B (like DEVEL-n) which are not in the master?

No, there is only a single primary in a replica set.

c) Can I “stop to replicate” so we can deploy to PRE, test the soft with fresh-data, trash the data with the testing and after tests have been complete “re-link” the replica so changes in PRE are deleted and changes in PROD are transported into PRE adequately?

Since there can only be a single primary, the use case of mixing production and test roles in the same replica set is not possible how you've envisioned.

Best practice would to isolate your production and dev/staging environments so there can be no unexpected interaction.

d) Is there any other better way than replica-sets suitable for this case?

There are some approaches you can take to limit the amount of data needed to be transferred so you are not copying the full database (10Gb) across from production each time. Replica sets are suitable as part of the solution, but you will need to have a separate standalone server or replica set for your PRE environment.

Some suggestions:

  • Use a replica set and add a hidden secondary in your development environment. You can take backups from this node without affecting your production application, and since the secondary replicates changes as they occur you should be doing a comparatively faster local network copy of the backup.

  • Implement your own scheme for partial replication based on a tailable cursor of MongoDB's oplog. The local oplog.rs capped collection is the same mechanism used to relay changes to members of a replica set and includes details for inserts, deletes, and updates. You could match on the relevant database namespaces and relay matching changes from your production replica set into your isolated PRE environment.

Either of these approaches would allow you control over when the backup is transferred from PROD to PRE, as well as restarting from a previous point after testing.

Altri suggerimenti

In our setup we use EBS snapshots to quickly replicate production database on staging environment. Snapshots are run every few hours as part of backup cycle. When starting new DB server in staging, it looks for most recent DB snapshot and use it for EBS drive. Taking snapshot is almost instant, recovery is also very fast. This approach also scales up very well, we actually using it in huge sharded MongoDB installation. The only downside is that you need to rely on AWS services to implement it. That can be undesirable in some cases.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top