How to syncrhonize on-site in-memory/no-sql datasources with central database in real-time

https://stackoverflow.com/questions/14611143

06-03-2022
|

Question

I have an interesting architecture problem.

My scenario is this: I need to centralize data that's currently stored within on-site SQL Server 2005 databases sitting within 60 storefronts, soon to be doubled to 120 storefronts. There's a main SQL Server 2005 database sitting at this centralized location. The reasoning for not simply relying solely on the SQL Server 2005 db at the centralized location is, that if the WAN connection is severed due to various reasons (weather, physical line severed, maintenance, etc.), the storefronts can continue to operate using the local SQL Server 2005 db's. I'm talking about mission-critical data.

Many logistical problems arise though. The storefronts rely on .NET Desktop applications built by an in-house team. That team leverages SQL Server replication from the local db's to the centralized db. Installing new versions of this in-house software and executing related SQL Server scripts per software installation to 60-someting locations requires a lot of grunt work to complete these installations (lengthy installation checklists, logging into the on-site server Remote Desktop, Dameware'ing into on-site workstations to verify if employees haven't left any of the desktop apps running, etc.). This grossly inefficient grunt work is mostly done on weekends and is carried out by a team of 6-7 people who don't get paid overtime for this. I come from a different in-house group that implements Java EE 6, Java SE 7, and JavaFX, though I know their pain since I used to be in that group. I think there is a much simpler solution. There's talks of switching our whole .NET application architecture over to a Java EE 6 architecture that implements client applications.

My idea is this: Implement an embedded/No-SQL local db that remains synchronized in real-time with a centralized RDBMS Oracle database (our company uses either Oracle 10g/11g or SQL Server 2005+). In case our WAN goes down, the storefronts can continue to operate seamlessly using the local embedded/No-SQL database. Once connectivity is reestablished, the embedded/No-SQL database persists its state to the centralized db and restores the real-time synchronization. I want the connection transitions to seem seamless to the users. I'm a big fan of technology like JPA 2, which simply reconnects after a connection has been severed.

Since we're considering switching to a Java EE6-based solution, I want to consider all tech that can work with WebSphere v8.0.x and is Open-Source. I don't want to deal with commercial licenses. That means considering all options like No-SQL db's, in-memroy db's, Lucene, Apache Jackrabbit, Corba/IIOP, JMS, EJB 3.1, CDI 1.0, JSF MyFaces 2.0.4, JPA 2, JAX-RS, and desktop clients possibly powered by JavaFX. The only remaining question is what can persist data to an embedded/No-SQL database and then synchronize that data real-time to a centralized data source so that the store-fronts can remain operational?

Solution

I love .NET, so I would try hard to make the SQL Server replication solution work, but failing that ...

Here are two NoSQL solutions I would consider for replication:

CouchDB

http://couchdb.apache.org/

They have pretty good replication and it will resume where it left off when connectivity is lost. There is a learning curve if you're coming from an RDBMS background, but it can work.

MongoDB

http://www.mongodb.org/

You can use a replica set with Mongo that syncs with a central (primary) node. When secondary nodes in the replica set go offline, you can bring them back and pick up where you left off.

The problem is that if the primary node gets a lot of writes, it might not be able to sync up again with a secondary node that has been down for a long period of time. It basically remembers all writes on the main/central node and eventually older writes get flushed out.

General Advice

Both Mongo and Couch are document databases. If you want to move to NoSQL you'll need to switch all your code to denormalized structures instead of normalized structures, which is what you're probably used to in the relational world.

It's a big shift and I know I've had trouble nailing models I feel good with in the document database world.

As far as moving data around in general, I've found Mule (http://www.mulesoft.org/) to be really good at connecting very different types of systems/databases.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow