سؤال

I need to test the restore with Ops Manager. For this I "clone" the production sharded cluster. I create VMs with the same size like production and do mongodump/mongorestore (Ops Manager Deployment). My test (for restore) don't need to be a consistent copy, for me no problem if around 5 GB is missing.

DATA SIZE: 573.6 GB

shard0
142.6 GB

shard1
145.94 GB

shard2
142.55 GB

shard3
142.52 GB

For simplicity reason I wish to take a mongodump and pipe it to mongorestore on the mongos.

I found a an old doc (v3.0) Backup a Small Sharded Cluster with mongodump. This documentation doesn't exists anymore in new MongoDB versions.

If your sharded cluster holds a small data set, you can connect to a mongos using mongodump.

what is a small data set in GB? See above for my deployment.

If you use mongodump without specifying a database or collection, mongodump will capture collection data and the cluster meta-data from the config servers.

I don't need to backup explicitly config RS?

When restoring data to sharded cluster, you must deploy and configure sharding before restoring data from the backup. See Deploy a Sharded Cluster for more information.

It means in plain English I need to define the shard key (and enable sharding) before restore?

Do I miss any steps/important things?

هل كانت مفيدة؟

المحلول

I found a an old doc (v3.0) Backup a Small Sharded Cluster with mongodump. This documentation doesn't exists anymore in new MongoDB versions.

This procedure is only for backing up the data from a small sharded cluster and does not cover recreating the sharded environment or capturing a point-in-time backup. As you've noticed, there is no mention of backing up the config server data or other essential steps that would be required for a sharded environment (for example, stopping the balancer). This procedure would perhaps be suitable for backing up data from a development or staging environment, but is not recommendable for a typical production environment.

For a more complete sharded backup procedure using mongodump, see: Back Up a Sharded Cluster with Database Dumps. Please ensure the version of the documentation you are referencing matches your MongoDB release series as there may be notable differences.

However, you mentioned using MongoDB Ops Manager which includes a specific feature for backing up sharded clusters. If you choose the option for a manual restore, Ops Manager will provide archive files to restore the config servers and shards. Since Ops Manager licensing is part of a MongoDB Enterprise subscription, I would recommend raising a commercial support case with MongoDB if you need recommendations or clarification on any of the procedures or your requirements.

what is a small data set in GB?

There isn't an absolute number. General factors include resource challenges such as the size of your data relative to RAM, available network bandwidth, and how quickly your data changes. Typically if you have sufficient data or workload to warrant sharding, you've also outgrown mongodump as a backup approach.

mongodump is going to read all data into memory which will have significant impact on working sets for shards if your data is much larger than available RAM. You also need to have enough disk space to save a complete backup (or compressed backup for MongoDB 3.2+) of the data dumped via a single mongos, enough network bandwidth to cope with the increased traffic, etc.

For your specific use case mongodump definitely isn't a recommendable strategy for several strong reasons:

  • this is a production environment
  • you want to clone/recreate the sharded cluster in another environment
  • you have access to MongoDB Ops Manager for backup

نصائح أخرى

It is not a small sharded cluster.

Mongodump will take around 5 hrs for 600GB and restore would take more than 5 hrs based on the indexes in place on your collection.

My best suggestion is:

  1. If you have a backup already in place taken by Mongo Ops-Manager, use that and restore on the new environment.

  2. If time is not really an issue in your case, use the dump and restore method.

  3. Mongodump and restore method can handle huge databases, if you want to restore few collections then use export and import option.

Note: A small hint i can give here to make the process faster is:

Mongodump will take a backup of the db and collection in 2 files for each, one is bson and other is json. Bson will have the indexes and json will have the data.

So, Mongorestore process will restore the json file first and then start restoring the bson file and apply indexes.

To make this process done faster, first make a mongodump and get all the indexes from all dbs and collections and apply the indexes first manually and then restore the data with mongorestore, this process will save atleast 30-40% of your time.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى dba.stackexchange
scroll top