Is there any recognized pattern supporting or discouraging the access to the same database from multiple applications?

https://softwareengineering.stackexchange.com/questions/419543

18-03-2021
|

Domanda

I don't have a formal education regarding application architecture, what I know I mostly "absorbed" on the job from enterprise architectures of the companies I worked for and/or from senior colleagues.

One principle I absorbed from a huge company is that each server-side enterprise application should only have access to its own database, and if other applications need to read/write data on that database they should do so through a layer of services exposed by the application "owning" the database.

I like this principle because I find it neat and clean, but I suppose that more consistent arguments exist in favor and against it.

I tried googling this concept but I mostly find Q&A/guides about how to connect to multiple databases from technology X. One tip I came across is that connecting to multiple databases from a single application will probably be faster, since we're skipping a step. This is relevant and interesting but hardly complete.

Hence my question: is there any recognized pattern supporting or discouraging the access to multiple database from a single application?

Edit: comments and answers made me realize that the actual question I had in mind was not much about accessing multiple database from a single application, but rather about sharing the same database between multiple applications (that in turn might connect to multiple database but, again, that's not quite the point). I modified the title of this question accordingly, from "access to multiple database from a single application" to "access to the same database from multiple applications". The answers below are still relevant to the topic.

Soluzione

Connecting multiple applications to the same database is often a bad idea. It's an integration anti-pattern you might call The Shared Database and it's unfortunately a very easy anti-pattern to fall into.

It's common for a database to be created as part of building an application. That application needs to store its data somewhere, right? You need a database so you create one. But then, after some time, other applications get created that might need access to some data that already exists (the usual suspects are users, customers, products, orders, etc). How do they get the data? Well, nothing simpler: just connect to the same database, right?

And now you have some problems:

a database is an implementation detail. You now exposed it to other applications;
an implementation detail can be changed because it's just that, an implementation detail. Want to change the implementation of your database, change tables structures, schemas, etc? You can't do it anymore without affecting other applications (anyone with the database connection credentials can basically reach out into your database and fetch whatever data they want - you might have applications connecting to your DB without even knowing they exist).
if your database structure isn't exactly how others need it, you might need to make changes to your DB for someone else's use case. Or they will do so themselves :) and break your application with the help of your own database.
hello tight coupling, bye bye loose coupling. That's bad however you put it.
you only store data, you don't store behavior. The behavior is in your application. How can others reuse your behavior? Maybe create a crap load of stored procedures to share the behavior. And now you get more and more entrenched into using that database vendor implementation. Want to change from OracleDB to MySQL? You now need approval from others.
etc.

A shared database is like having neighbors paying you a visit and then refusing to leave. You now have to live with them.

So exposing "your" database to "others" by using and API, a gateway, a facade, a service, or whatever, is a way to share the data and behavior without having to live with your neighbors. It's a good practice that hides the implementation details. Of course it's not without its disadvantages: it's more complex than select * from, more verbose, it still is an integration point (which needs to evolve, maybe not at the same pace for everyone using it, so you now need API versioning), etc.

When it comes to one application connecting to multiple DBs (as your title asks), things are simpler and less problematic concerning what I said above. But this is also a trade-off. Transactions might suck across multiple databases, and so will performing joins. Keeping things separated makes sense but will cause performance degradation as now you need to open connections to more databases. You might also cause problems some place else, if other applications connect to the same databases (i.e. not just "your" application to "your" own multiple databases), etc.

At the end of the day, it's not about patterns or anti-patterns, but about thinking carefully about one design decision or another (i.e. do you need one DB or more and how you will access them), trade-offs, and above all, a good dose of common sense.

Altri suggerimenti

The other way around - having multiple applications (services) access the same database - would be an anti-pattern which implies lots of problems such as high coupling between those applications, unwanted side-effects and potential performance problems because independent scaling per application on a database level is not possible to name a few.

Having a single application accessing multiple data stores on the other hand can be totally fine for the right reasons and if those data stores are owned by this application only. High scalable cloud services such as CDNs for cost sharing might be an exception here but in general you get the idea.

Think of some catalog microservice which is part of a web shop. It might have some sql databases for it's product information but some other more suited data stores for storing product images. This is totally fine in this case even preferable from my point-of-view.

This Polyglot persistence pattern is, for instance, a natural fit for microservices architectures.

The only thing I would look into is, if you access several data stores from the same application, make sure all these data access use cases belong to the same problem domain. Single applications (services) should have focused responsibilities. Multiple data stores could be a signal a service has too many responsibilities, could be but does not have to be of course.

One principle I absorbed from a huge company is that each server-side enterprise application should only have access to its own database, and if other applications need to read/write data on that database they should do so through a layer of services exposed by the application "owning" the database.

This is the philosophy adopted by microservices. Note that this philosophy comes with its own set of problems: see Cap Theorem and Eventual Consistency

The idea behind microservices is that, in a large corporation, each team is responsible for its own application and nothing else. This isn't a bad idea in and of itself, but it must be tempered with a bit of pragmatism. If the microservices are too small/granular, you will be spending a great deal of time communicating between microservices using less than ideal connection mechanisms. The tradeoff is that microservices can allow a more scalable architecture.

The opposite is also true, of course. Building a monolith can constrain you in ways that microservices cannot (although Stack Exchange seems to have done pretty well with their monolith), but your company would have to be very large for that to occur.

Further Reading
The macro problem with microservices

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a softwareengineering.stackexchange