Migrating to MongoDb?

https://stackoverflow.com/questions/20508605

31-08-2022
|

Question

I currently have a mySQL database that I will be migrating to MongoDB. I have started writing a migration tool and I am curious if there is a more efficient way for me to reach my end goal.

The actual structure of the data is fairly complicated but I will attempt to use a simplified version to ask my questions. Say I have the following MySQL Tables:

Surveys
   Survey_id
   Title
   CreateUser (fk)

Users
   User_id
   Fname
   Lname

My Plan is to migrate these to MongoDB collections that basically match this structure. The foreign key would become a reference to the User collection. I am currently planning on taking the following steps to do this:

SELECT Survey_id, Title FROM Surveys
Insert this data into the Surveys collection in MongoDb
Select User_id, fname, lname FROM Users
Insert into the users collection in MongoDB
SELECT CreateUser, SurveyId FROM Surveys
Find every corresponding record in the Users collection based on the CreateUser id and insert the reference to the Users collection into the existing Surveys document.
Remove the Survey_id field from every Surveys document
Remove the User_id field form every Users document

Is this a reasonable approach to take, or am I making things more complicated than they need to be?

Solution

If the data isn't huge, it makes sense to change the order of operations and do the mapping in RAM, i.e.

SELECT * FROM Users
Insert users in MongoDB
Add pairs (SQL id, MongoDB id) to a hash table
SELECT * FROM Surveys
for each survey, replace CreateUser with Hashtable[CreateUser]
Insert Surveys into MongoDB

Typically, this will be quite a bit faster because you don't need to update objects in mongodb and you won't have to query your data twice.

You should try to use batch inserts for MongoDB instead of inserting documents one-by-one. Instead of getting the newly created documents' ids from the database, you can assign the MongoDB primary key yourself. Otherwise the driver will do it anyway, not the database itself, so there's no real advantage in not doing so.

If the amount of data is huge (such that you can't keep the lookup tables in RAM), I'd try to stick to the lookups and process them subset-by-subset. That will be tricky if you have many foreign keys though.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow