How does Polyglot Persistence handle relational data?

https://softwareengineering.stackexchange.com/questions/279409

08-10-2020
|

Question

I have recently been studying up on microservices, and an associated idea that I've seen is that of polyglot persistence and microservices working with their own databases, or whatever storage they may be using. My question is how does this model handle relational data that may span more than one service?

For example, say you have a microservice for dealing with Customer information, and one dealing with Orders. I would assume an Order would need a way to reference a given Customer in its storage model, which I believe a relational database handles well, but if these microservices truly have independent databases, how is this handled?

As a side note, I've worked with one large relational database in my organization for the last three years, so I could understand if my thinking has been clouded by repetition, but I am very eager to understand this concept. Thanks!

Solution

"Polyglot persistence" is just a fancy name for "heterogenous data stores" (which I suppose is also a fancy name). You can solve the problem in a number of ways:

Have "One Table to Rule Them All:" a specific Primary Key in a specific database is the master identifier for a particular entity,
Have mapping tables that map the keys of one data store to the keys of another,
Use keys that are globally-unique, like GUIDs.

OTHER TIPS

(...) microservices working with their own databases

As always with architecture decisions, there is trade-off between various objectives. One of the key objectives of microservices is to have independently deployable components. At one extreme, this means separating not only the code bases, but also the respective data persistence.

Although viable in some scenarios, I would contest this approach if the microservices pertain to the same backend system. In such a case it seems perfectly valid that several microservices share their data in the same database, as long as they have well-defined scopes and responsibilities - and still achieve the objective of independent deployability.

I would assume an Order would need a way to reference a given Customer in its storage model, which I believe a relational database handles well, but if these microservices truly have independent databases, how is this handled?

Each microservice needs to offer a way for related object identity -- some reference key, e.g. as a external_ref field. The reference key should be independent from the actual model or primary key of the (foreign) microservice's database as to keep the services decoupled from internal changes, data migrations etc. Typically, the keys employed are either surrogate keys or natural/business keys.

Example

Order microservice references a Customer object

the /order/<key> resource will contain a customer_ref, e.g. of the form <foreign service>:<foreign key>, specifically customer:123456
the /customer/<key> resource may contain a collection of orders it knows of, say as a relation customerdb.Orders(customer_id, order_ref), where order_ref is of the same form as above, order:abc3566 (I'm using abc3566 to indicate the key schemes need not be the same).

Using this approach, a service broker in your architecture can resolve any such _ref into an actual service call and return the respective foreign object. Note that I'm blatantly assuming a REST-style set of microservices here, but the principle applies to any other kind of services.

Issues

Typically there are several issues that arise from this:

How to join data efficiently?

In a relational model, joins are done by the database, and thus are relatively efficient. To achieve the same here, you need to subset both service's result to a minimum subset and the join the data client side. Typical strategies also involve caching to avoid querying the same (potentially rather static) data over and over again.
How to keep a set of microservices data free of redundancies?

Essentially, the same concept as with any relational database design apply. However, due to caching strategies, data redundancies are sometimes introduced and need to be dealt with. One approach is to introduce webhooks (i.e. callbacks) so that a service can notify a dependent service of changes.
How to limit the impact of such an architecture on complexity? (in particular dependency management, deployment, testing)

As always with any architecture decisions, there are trade-offs in microservices. It is up to the architect(s) to evaluate the pro & cons of such an approach. Over time, we will find abstractions and frameworks that deal with the most common complexities in a way that will essentially remove the downsides while keeping the advantages.

If you're designing the services to be truly independent, then they're going to have to serve identity functions...ways to uniquely identify entities, so that they might be externally referred to. After all, the service is the boundary and you're not supposed to see through it to the underlying store. That identity really, really shouldn't change in the face of entity mutation - so it shouldn't be made up of mutable data - such as a customer's name or an employee name.

A particular application that uses the identity may map it or may use it as provided. Each application developed would be free to map-or-use the identity in a manner consistent with it's purpose. There is no reason to assume some kind of globalized/centralized mapping. If a centralized mapping feels right, then a centralized, relational store for all the microservices has got to feel even more right.

It gets complicated if the services need to support guarantees...like "I won't destroy this entity until you release it" - which could be a long- or short-lived promise. This is kinda like declaratively externalizing the idea of a foreign key. Applications would have to be able to release the protection on entities when all references are removed; and that can get complicated.

For example, imagine service s serving entity e, and application a wants to refer to e and wants a guarantee that e won't perish, then a needs an unique way to refer to the contextual protection of e. This way, a can inform s when e is no longer needed. Reference counting (in s or in a)isn't good enough.

Possibly, s can simply deprecate e and not destroy it - maybe mark it as logically deleted. This obviates the complexity of externalizing the foreign key relationship but makes data live longer than it might have to otherwise.

Another strategy a might employ is just getting over the fact that a referred-to entity is no longer there. That's not so simple either.

In short - applications relying on a multiplicity of microservices have their work cut out for them - the services may or may not play well in such an environment. One wonders if the joy of picking one's own data store when building a service is outweighed by the complexity visited upon the service's consumers. The strategies chosen are going to be dictated one's ability to tailor the service's behavior to the needs of applications. If we're talking about microservices from "out there", then the applications are going to have to be that much more robust, and may even decide to replicate substantial portions of the data provided by the service.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange