Best approach to Architect the integration of two separate databases?

https://stackoverflow.com/questions/5103014

12-12-2020
|

Question

I've ran into the following questions at work, and I don't have the experience or knowledge in order to answer them, I'm hoping that some of you wiser folk may be able to point me in the right direction, any answers will be greatly appreciated!

Scenario

We have two aspects of the business using separate databases, Human resources and Operational Areas (Homecare).
Human Resources keep track of the company’s employees, shift patterns, absence, pay etc. Homecare keeps track of client information, home visits, visit dates and the employee/s responsible for providing that visit.

These two systems are separate and we’re currently in the process of looking at ways to integrate them.

Additionally, we’re looking at how to organise our code that looks at these two databases, into reusable, organised libraries.

We have three applications re-using a HumanResources.dll, responsible for communicating with an EF 4 object context, contained in the library. The object context is almost a mirror image of the database as it stands.

Questions

We’re about to add a fourth application that will use data in the HR database.

Do we:

Create a new EF data model, responsible for providing information that only the application needs, while duplicating some common entities such as the Employee.

Add the new entities/tables to the already large model and accept it’s going to get large.

Longer term, we need to join the Shift Pattern Information in the HR database to the Client Visits on the Operational Areas (Homecare) database in a 5th application.

We’ve got an idea on what we could do; we’ve come up with the following:

Create a layer that sits between the HumanResources object context and Homecare object context, responsible for joining the two sets of data together.

Are there any other approaches that would benefit us?

Solution

Implement the Facade Pattern

A facade is basically an adapter for a complex subsystem. Since you have two subsystems, I would recommend creating three classes with the following functionalities:

HumanResourcesFacade : A class that wraps all the "Human Resources" functionality. The job of this class is to expose methods that perform each Unit of Work that the Human Resources application is responsible for without exposing any information about the Human Resources application to the client.
HomecareFacade : A class that wraps all the "Homecare" functionality. The job of this class is to expose methods that perform each Unit of Work that the Homecare application is responsible for, without exposing any information about the Homecare database to the client.
ApplicationFacade : A class that wraps both HumanResourcesFacade and HomecareFacade and provides public methods to your clients that do not require knowledge of the inner workings of either of the two nested facades. The job of this class is to know: (a) which of the two nested facades are responsible for each client call, (b) execute the client's call of the ApplicationFacade by making a call to the appropriate method on the nested Facade, and (c) translating the data received from the nested facade into a format that is usable by the client and not dependent on the data formats of either nested facade.

I would recommend using a POCO object model to create a common in-code representation of the data that is not dependent upon the actual persistence implementation. The domain model technique that Adrian K suggested is a good approach, but if you are not familiar with the patterns and methodology could end up being very confusing and taking much longer than techniques that are more intuitive. An alternative is to just use data objects and a Data Mapper. The data mapper, basically takes the data from a data source and turns it into an object that is not dependent on the data source or the mapper object. I included a link below.

One thing I would like to clarify is that while I said the ApplicationFacade has three jobs, I am not advising that you violate the Single Responsibility Principle. I do not mean that the class needs to do all those three things by itself, but that it should encapsulate whatever mechanism you decide to use for executing that process, and that no other parts of the application should access those concerns from outside of the ApplicationFacade. For example, your business objects should not know from which data source they were constructed - that information should not be accessible from anywhere other than what is encapsulated by the ApplicationFacade class.

Reference Articles

OTHER TIPS

Sounds like you need to do some serious data-modeling.

You definitely need it for the long term so that you don't get yourself into serious strife. (if there's one thing that will have a significant impact on your ability to support / extend systems and support business growth - it's data management). The good thing about (business) data is that your business stakeholders will (or should) have a good understanding of it and be suitably motivated to support you. The value such an exercise will bring should be an easy sell. Having some of this in place in the short term will help as well.

Data sources which come with packages products (Commercial Off The Shelf - COTS) will not be open to change without putting those systems at risk - but that doesn't mean you can't use ETL and other databases to create data marts that bring disparate data together. In this sort of an approach the data modeling, and data mapping between systems will be important - but also the timing.

You will have more flexibility with in-house apps - but you might want to resist tactical changes unless you have a very compelling reason, otherwise you'll probably have to re-work them anyway.

As part of this exercise you'll want to consider the System of Record of each piece of data - where does it come from? Who owns it? You can start at a high-level by drawing up a conceptual data model, this will probably deal more with logical datasets than specific "columns".

Use this information to guide further decisions.

In terms of your immediate approach (and your question): in general terms it'd think about putting a layer of abstraction between your systems and the data, so that the applications are cushioned from change when that happens.

Create a new EF data model, responsible for providing information that only the application needs, while duplicating some common entities such as the Employee.

The big issue with duplication is getting the data into a state thats muddy - which is the "real" record. This can easily kill you. What are the benefits of this approach in your context? Would you be doing this from a supportability perspective? Ease of development?

It depends very much on what you mean by integration.

If you just want to combine the various tables for reporting purposes then you should look at some process to Extract and Load selected data from each system into a Datawarehouse. You will need to define a common data model for both systems. This data can then be ised for reporting.
If you want one system to invoke the services or retrieve data from another system then I would recommend you use the classic SOA pattern. Expose the functions you want to make available to the other system as services via SOAP, REST messages or similar. And get the client systems to use these methods and only these methods to send or retrieve data.

Avoid if at all possible looking directly into the foreign systems database, replicating data from one system to the other, or making direct API calls to the source system. The guiding principle should be "if I replace system X with system SuperX how easy would it be to keep the other systems working".

Since you are looking for a long-term solution, and it's about business' infrastructure, I recommend you will migrate to LDAP. Have a read.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow