Hybrid SQL and MongoDB solutions [closed]

Question 1

I can share some details about how I've done it at my current job.

Keep in mind for us, the relational stores are considered legacy systems. They have the authoritative entities and we're moving that data to MongoDB for reads. Going forward we want writes to go to MongoDB as well instead of the relational systems.

Moving data from MySQL to MongoDB (specifically Wordpress)

TL;DR - Use triggers and stored procedures

For this approach we set up triggers on tables in MySQL that watched for inserts, updates and deletes. When one of those operations happened we would fire a stored procedure that would generate an entity and place it into a table that acted as a queue.

Then we used an external process to poll that table every few seconds. We wrote a stored procedure to ensure that we never process the same queued item more than once.

The external process (a Mule ESB flow) would do a tiny bit of transformation on the result of the stored procedure that read data from the queue table - and then would pass it to MongoDB.

It takes maybe 2 seconds to see a Wordpress post as a JSON document in MongoDB after publishing it from the Wordpress back end.

Moving data from SQL Server to MongoDB (Approach 1)

The first approach I've used is to write stored procedures that produce XML. The XML syntax is a little awkward in TSQL, but it gets the job done. The nice thing is that you can make the XML structures arbitrarily deep.

Once you produce XML it's a hop-skip and a jump to write a tool that transforms the XML into JSON. You can then have an external process (again we've used Mule) pick up the XML results, transform them into JSON and then write to MongoDB.

Moving data from SQL Server to MongoDB (Approach 2)

This approach involved writing a C# project in Visual Studio that was deployed as a CLR function/stored procedure in SQL Server.

This is by far my favorite approach because the code is easy to write and it runs very fast on the server. You can even serialize the results in the CLR function and then store them in a temporary table - or just return them from a stored procedure.

It might be tough to convince a DBA to deploy your code (including potentially a JSON serializer) onto a production SQL Server box, but the benefits are outstanding. It also performs much better than the XML approach.

Question 2

A PHP or Java application will be built around objects (entities). Depending on the role and usage model (how frequently they are persisted and retrieved) of an entity you may need to choose an appropriate data store for it. For example, you might persist userid-passwords in RBDMS, your actor objects (documents, music, blog posts, etc) in a document oriented DB like mongodb, and you could persist the frequently used small objects like file system hierarchy in redis. It all depends on your architecture - there is no readily available framework for this.