Is it possible to migrate data from a shard MongoDB to a non-shard MongoDB with dump

https://dba.stackexchange.com/questions/255642

20-02-2021
|

Question

Question:

I have a shard mongo cluster and I want to migrate data from this cluster to a one node mongo instance. So I want to know is it possible to use dump/restore?

I use pymongo and Django and python3 and mongo3.4

If the answer is No, So what is the best way to migrate data from a mongo cluster to a single node mongo instance?

I tried to write a script to migrate the data:

# fetch data from old cluster
data = MongoCollection._get_collection().find(query)

# insert data to new single mongo instance
NewMongoCollection.insert(data)

What I did had a lot of issues such as:

Fetch whole data and store data in RAM
If something goes wrong, I have to start it again
...

Solution

It's definitely possible to mongodump from a sharded cluster and mongorestore to another deployment (standalone, replica set, or sharded cluster).

With this approach there are some general considerations to be aware of:

You should always mongodump data via mongos for a sharded cluster.
mongodump needs to read all requested data through the memory of the mongod process(es), so this can have a significant resource and performance impact for the deployment being backed up (particularly if the uncompressed data set is much larger than available RAM).
mongorestore will rebuild all indexes for collections being restored, which can have a significant performance impact on the target cluster. If you are using a version of MongoDB server older than 4.2, the default foreground index builds will block other reads & writes to the destination database. MongoDB 4.2 has less impactful index build process (see: Index Builds on Populated Collections).
mongodump does not take a point-in-time backup, so if the source cluster is being actively updated your backup may not be consistent.
If your deployment is using MongoDB 4.2 or newer and distributed transactions, mongodump/mongorestore should not be used (the lack of point-in-time consistency is particularly problematic in this case).

Your original approach of writing a script to migrate the data would also be viable if you iterate rather than trying to store the whole result set in RAM. You could make this approach resumable by scanning a collection in a predictable order (such as sorting by _id, assuming this is monotonically increasing if new documents continue to be inserted).

Unless you have very custom requirements for data export or throttling, I would be inclined to use mongodump and mongorestore with appropriate options for compression and concurrency (for example parallel collections to dump and number of insertion workers per collection).

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange