Database agnostic DAO (NoSQL + SQL)

https://softwareengineering.stackexchange.com/questions/389231

22-02-2021
|

Question

Background

While writing a new component, I m in middle of making a decision of SQL/NOSQL database (Mongo vs Mysql) for my storage layer. As of today, mysql seems to be a perfect fit for my use-case (6-7 domain entities, closely related to each other). Still, I want to keep my integrations with the data layer abstract enough to switch over to a nosql (mongo) in the future.

While trying to build this abstract Data access layer, I feel I am compromising with the offerings of RDBMS big-time (Since NOSQL doesn't support joins as the first class construct, cannot afford to expose joins and other prominent RDBMS features as part of this abstraction.)

Question :

Is it an overkill trying to build such level of abstraction in first place? Is it even possible to build such level of abstraction without compromising on the RDBMS offerings? If possible, What are the recommended patterns ?

Solution

The best way to guarantee that you stay reasonably decoupled from the database, but at the same time remain free to use any feature of it, is to not create an abstraction layer for the database. (Well, unless you have the explicit requirement now, that you need to support multiple databases. Otherwise YAGNI.)

The worst thing one can do, is to try to stay "database agnostic". This will almost automatically result in some "common denominator" type interfaces, usually trivial CRUD operations. Then you either can't use any specific feature of your storage backend (which is stupid considering what awesome features dbs have nowadays, not even mentioning completely different paradigms), or you have to constantly introduce new methods for specific features or queries. Even worse, because you don't want this abstraction to "explode" you will be sort-of forced to re-use methods for new requirements, which will be ill-fitting and painful.

The alternative is to model your domain, and provide database specific implementations where it makes sense. One example I came across: We had the requirement to freeze all credit cards of a customer (bank domain). This was initially implemented with an ORM, which had multiple connected entities (data objects with the usual 1-1/1-n relations). We had to issue a query for accounts, then cards, set flags on cards and let the ORM deal with persisting.

Instead of all that, I introduced a method Customer.freezeCreditCards(), which fired an "update" statement directly into a database. While that's not a particularly exciting operation, it shows that if you have the business method somewhere where it makes sense (where the data for it is), that it is trivial to use any optimization or extra feature you require. And you don't have to abstract/generalize features.

OTHER TIPS

Why would you wish to create such a degree of abstraction?

The very reason that different storage technologies exist, is because each have a matrix of advantages and disadvantages that cannot be made general - not even by the best minds of the industry and the economic resources of massive corporations.

If any storage technology would seem to be sufficient for a particular application, then just pick one and run with it.

If you design and write code for many storage technologies at once, accounting for the limitations of all and the special benefits of none, then the conclusion of this process may be an application that is slow and has little effective functionality.

Or at the very least, it will be an application in which the design stage is treacle-slow, the implementation longer and more convoluted than necessary, verification is less easily done, and further maintenance and modification becomes a fearsome challenge.

It may be completely sufficient to design and write code for one well-integrated storage method. If the storage method later needs to be changed, a different application can be written (to either supplement or replace the original).

The best way to ensure code is flexible is to flex it.

The mistake many people make when coding with a DB in mind is they delve into the DB's manual and learn every crazy little feature it offers, and use them. Soon you have to use that db.

This is exactly the same problem we have with CPUs. Every CPU has a set of op codes. Some are basic and popular. Some are spiffy and different. The spiffy and different ones are the ones that get you into trouble because now your code only works with exactly that CPU.

This is one of the reasons Java has the JVM. It only offers the basic and popular op codes and figures out how to do them on many different CPUs for you. More importantly it keeps you from using the spiffy and different ones casually.

You can do the same thing with databases if you want to be DB agnostic. Let your applications needs drive the design of the DB abstraction, not the databases capabilities. And since the best way to ensure code is flexible is to flex it I recommend you test your abstraction by trying to make it work with at least two databases before you pronounce it agnostic.

You can provide individual adapters to each database but intimate knowledge of which DB you're using should stop there. Both adapters must work under the DB abstraction that hides which DB you have.

It might sound like a lot of work but done this way you don't have to take on every possible thing that the DB might do. You just have to take on what your app actually needs. You might even find you can get by without the DB.

First of all, consider what your application needs to do. If your requirements include joining data from multiple sources, and this is not an optional feature, than regardless of the technology you select, you will need to implement such logic one way or another. In particular, this may mean that certain databases will not be well-suited for your your task, and there's not much you can do about it. If there was a single database which was best for all tasks, we would probably not have dozens of different databases on the market.

Once you know what kind of logic you need, create a data access API within your application. It should work on domain objects, which should be separate from the objects you will be storing in your database, even if they may look very similar. So your API might be, for example, a Java interface:

interface CustomerRepository {
    Customer findCustomer(String id);
}

Now, the actual database code should be in a class implementing this interface, for example:

class MySqlCustomerRepository implements CustomerRepository {
    Customer findCustomer(String id) {
        //do stuff using MySql and some DAO objects you will use with it
    }
}

If at a later time you decide to use another database, you will be able to change the implementation to e.g.:

class MongoDbCustomerRepository implements CustomerRepository {
    Customer findCustomer(String id) {
        //do stuff using MongoDB and some DAO objects you will use with it
    }
}

The code inside will probably be completely different, as will be the DAO objects used with the particular databases. For example, you might decide to store such data as user's name and address in separate DB tables in a normalized manner and use joins between those tables to combine the data when using MySql, while you use denormalized data and store the address together with the name and other fields when using MongoDB. Correspondingly, the DAOs used within the repositories will be different. However, the domain object Customer you return in the end, will be the same, and this means that the application code using your CustomerRepository need not be aware of the implementation you use.

This is the power of abstraction. It will make testing your application easier as well since for some tests, you will be able to mock the repository instead of using a real one.

Note however, that changing a database implementation may still not be so easy. It is possible, I've done such things myself, but it's a serious decision and depending on your access patterns, what kind of queries you use, performance requirements, etc., your choice may be quite limited (e.g. only between different SQL databases or between different KV-store DBs). I recommend the repository abstraction nonetheless, since it clearly separates concerns (app business logic and data storage) and makes testing easier, which may both be very helpful even if you end up never changing the DB implementation underneath.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange