Question

I currently have a mySQL database that I will be migrating to MongoDB. I have started writing a migration tool and I am curious if there is a more efficient way for me to reach my end goal.

The actual structure of the data is fairly complicated but I will attempt to use a simplified version to ask my questions. Say I have the following MySQL Tables:

Surveys
   Survey_id
   Title
   CreateUser (fk)

Users
   User_id
   Fname
   Lname

My Plan is to migrate these to MongoDB collections that basically match this structure. The foreign key would become a reference to the User collection. I am currently planning on taking the following steps to do this:

  1. SELECT Survey_id, Title FROM Surveys
  2. Insert this data into the Surveys collection in MongoDb
  3. Select User_id, fname, lname FROM Users
  4. Insert into the users collection in MongoDB
  5. SELECT CreateUser, SurveyId FROM Surveys
  6. Find every corresponding record in the Users collection based on the CreateUser id and insert the reference to the Users collection into the existing Surveys document.
  7. Remove the Survey_id field from every Surveys document
  8. Remove the User_id field form every Users document

Is this a reasonable approach to take, or am I making things more complicated than they need to be?

Was it helpful?

Solution

If the data isn't huge, it makes sense to change the order of operations and do the mapping in RAM, i.e.

  1. SELECT * FROM Users
  2. Insert users in MongoDB
  3. Add pairs (SQL id, MongoDB id) to a hash table
  4. SELECT * FROM Surveys
  5. for each survey, replace CreateUser with Hashtable[CreateUser]
  6. Insert Surveys into MongoDB

Typically, this will be quite a bit faster because you don't need to update objects in mongodb and you won't have to query your data twice.

You should try to use batch inserts for MongoDB instead of inserting documents one-by-one. Instead of getting the newly created documents' ids from the database, you can assign the MongoDB primary key yourself. Otherwise the driver will do it anyway, not the database itself, so there's no real advantage in not doing so.

If the amount of data is huge (such that you can't keep the lookup tables in RAM), I'd try to stick to the lookups and process them subset-by-subset. That will be tricky if you have many foreign keys though.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top