Life-copies for devel-team in MongoDb

Question 1

Replica-sets seem the winners, but we have serious doubts:

Suppose we want a replica-set to sync live A/PROD into B/PRE (and have A likely as a primary and B likely as secondary):

a) Can I select “a few” databases from A to replicate PROD but leave OTHER alone?

As at MongoDB 2.4, replication always includes all databases. The design intent is for all nodes to be eventually consistent replicas, so that you can failover to another non-hidden secondary in the same replica set.

b) Can I have “extra” databases in B (like DEVEL-n) which are not in the master?

No, there is only a single primary in a replica set.

c) Can I “stop to replicate” so we can deploy to PRE, test the soft with fresh-data, trash the data with the testing and after tests have been complete “re-link” the replica so changes in PRE are deleted and changes in PROD are transported into PRE adequately?

Since there can only be a single primary, the use case of mixing production and test roles in the same replica set is not possible how you've envisioned.

Best practice would to isolate your production and dev/staging environments so there can be no unexpected interaction.

d) Is there any other better way than replica-sets suitable for this case?

There are some approaches you can take to limit the amount of data needed to be transferred so you are not copying the full database (10Gb) across from production each time. Replica sets are suitable as part of the solution, but you will need to have a separate standalone server or replica set for your PRE environment.

Some suggestions:

Use a replica set and add a hidden secondary in your development environment. You can take backups from this node without affecting your production application, and since the secondary replicates changes as they occur you should be doing a comparatively faster local network copy of the backup.
Implement your own scheme for partial replication based on a tailable cursor of MongoDB's oplog. The local oplog.rs capped collection is the same mechanism used to relay changes to members of a replica set and includes details for inserts, deletes, and updates. You could match on the relevant database namespaces and relay matching changes from your production replica set into your isolated PRE environment.

Either of these approaches would allow you control over when the backup is transferred from PROD to PRE, as well as restarting from a previous point after testing.

Question 2

In our setup we use EBS snapshots to quickly replicate production database on staging environment. Snapshots are run every few hours as part of backup cycle. When starting new DB server in staging, it looks for most recent DB snapshot and use it for EBS drive. Taking snapshot is almost instant, recovery is also very fast. This approach also scales up very well, we actually using it in huge sharded MongoDB installation. The only downside is that you need to rely on AWS services to implement it. That can be undesirable in some cases.