Decentralized Data Management - encapsulating databases into microservices [closed]

https://softwareengineering.stackexchange.com/questions/264824

06-10-2020
|

Question

I've recently been taking a course on Software design and there was a recent discussion/recommendation about using a 'microservices' model where components of a service are separated into microservice sub-components that are as independent as possible.

One part that was mentioned was instead of following the very often seen model of having a single database that all the microservices talk to, you would have a separate database running for each of the microservices.

A better worded and more detailed explanation of this can be found here: http://martinfowler.com/articles/microservices.html under the section Decentralized Data Management

the most salient part saying this:

Microservices prefer letting each service manage its own database, either different instances of the same database technology, or entirely different database systems - an approach called Polyglot Persistence. You can use polyglot persistence in a monolith, but it appears more frequently with microservices.

Figure 4

I like this concept and, among many other things, see that as a strong improvement on maintenance and having projects with multiple people working on them. That said, I am by no means an experience software architect. Has anyone ever tried to implement it? What benefits and hurdles did you run into?

Solution

Let's talk positives and negatives of the microservice approach.

First negatives. When you create microservices, you're adding inherent complexity in your code. You're adding overhead. You're making it harder to replicate the environment (eg for developers). You're making debugging intermittent problems harder.

Let me illustrate a real downside. Consider hypothetically the case where you have 100 microservices called while generating a page, each of which does the right thing 99.9% of the time. But 0.05% of the time they produce wrong results. And 0.05% of the time there is a slow connection request where, say, a TCP/IP timeout is needed to connect and that takes 5 seconds. About 90.5% of the time your request works perfectly. But around 5% of the time you have wrong results and about 5% of time your page is slow. And every non-reproducible failure has a different cause.

Unless you put a lot of thought around tooling for monitoring, reproducing, and so on, this is going to turn into a mess. Particularly when one microservice calls another that calls another a few layers deep. And once you have problems, it will only get worse over time.

OK, this sounds like a nightmare (and more than one company has created huge problems for themselves by going down this path). Success is only possible you are clearly aware of the potential downside and consistently work to address it.

So what about that monolithic approach?

It turns out that a monolithic application is just as easy to modularize as microservices. And a function call is both cheaper and more reliable in practice than an RPC call. So you can develop the same thing except that it is more reliable, runs faster, and involves less code.

OK, then why do companies go to the microservices approach?

The answer is because as you scale, there is a limit to what you can do with a monolithic application. After so many users, so many requests, and so on, you reach a point where databases do not scale, webservers can't keep your code in memory, and so on. Furthermore microservice approaches allow for independent and incremental upgrades of your application. Therefore a microservice architecture is a solution to scaling your application.

My personal rule of thumb is that going from code in a scripting language (eg Python) to optimized C++ generally can improve 1-2 orders of magnitude on both performance and memory usage. Going the other way to a distributed architecture adds a magnitude to resource requirements but lets you scale indefinitely. You can make a distributed architecture work, but doing so is harder.

Therefore I would say that if you are starting a personal project, go monolithic. Learn how to do that well. Don't be distributed because (Google|eBay|Amazon|etc) are. If you land in a large company that is distributed, pay close attention to how they make it work and don't screw it up. And if you wind up having to do the transition, be very, very careful because you're doing something hard that is easy to get very, very wrong.

Disclosure, I have close to 20 years of experience in companies of all sizes. And yes, I've seen both monolithic and distributed architectures up close and personal. It is based on that experience that I am telling you that a distributed microservice architecture really is something that you do because you need to, and not because it is somehow cleaner and better.

OTHER TIPS

I wholeheartedly agree with btilly's answer, but just wanted to add another positive for Microservices, that I think is an original inspiration behind it.

In a Microservices world, services are aligned to domains, and are managed by separate teams (one team may manage multiple services). This means that each team can release services entirely separately and independently of any other services (assuming correct versioning etc).

While that may seem like a trivial benefit, consider the opposite in a Monolithic world. Here, where one part of the application needs to be updated frequently, it will impact the entire project, and any other teams working on it. You will then need to introduce scheduling, reviews etc etc, and the whole process slows down.

So as to your choice, as well as considering your scaling requirements, also consider any team structures required. I would agree with btilly's recommendation that you start Monolithic and then identify later where Microservices might become beneficial, but be aware that scalability is not the only benefit.

I worked at a place that had a fair amount of independent data sources. They did put them all into a single database, but into different schemas that were accessed by webservices. The idea was that each service could only access the minimum amount of data they required to perform their work.

It wasn't much overhead compared to a monolithic database, but I guess this was mostly due to the nature of the data that was already in isolated groups.

The webservices were called from the web server code that generated a page, so this is a lot like your microservices architecture, though possibly not as micro as the word suggests and not distributed, though they could have been (note that one WS did call out to obtain data from a 3rd party service, so there was 1 instance of a distributed data service there). The company that did this was more interested in security than scale however, these services and the data services provided a more secure attack surface in that an exploitable flaw in one would not give full access to the entire system.

Roger Sessions in his excellent Objectwatch newsletters described something similar with his Software Fortresses concept (unfortunately the newsletters are no longer online, but you can buy his book).

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange