How to agree on message schema in a Publish–subscribe pattern

https://softwareengineering.stackexchange.com/questions/409772

10-03-2021
|

Pergunta

I'm working on a project that uses PubSub(GCP), my question is not specific to GCP, it's more regarding to the architectural pattern(I'm used to statically typed languages, and I have a hard time figuring out how to do this the right way).

The services that I'm working on are written in go and what I would like(at least for me this seems the right way) is to enforce the consumers and producers to use the same message format(agree on the schema at compile time). Right now the 2 parts are totally independent so we have the message format specified in 2 places(this really bugs me out).

In the beginning, I thought that the consumer should own the message format(don't judge I'm new to this kind of architecture), had a discussion with a coworker and did some reading afterward, and I agree that this would kinda break the pattern, as the producer would know about the consumer, also an issue appears when you have multiple consumers.

My next thought was to extract the message format in a different package and have both consumers and producers use the format from there, but this again would increase the coupling. I tried to do some reading regarding this but I can't find a more detailed explanation/diagram of the pattern that would answer my question, and for sure I'm not the only one who thought about this problem.

Am I on the right track or what would be the right way to solve this? Or am i just making my life more complicated than it has to be?

Solução

Your publisher and consumers are already coupled, a form of coupling called external coupling, where you share a schema. So referring to some common module that defines that schema doesn't really increase the coupling, it just makes it more explicit.

There are a gazillion different schema formats, ranging from custom source code in a specific language to language-agnostic formats like Yang. The language-agnostic ones are more general, but require some sort of translation or code generation step to use. If you define your schema as go source code, it's easier to use, as long as you don't need to produce or consume a message using a different programming language.

Outras dicas

I see approximately four different general classes of scenarios (I am talking about software packages, not instances of running binaries).

Single producer, single consumer - at this point, it doesn't matter (much) where the schema lives, but there should be only one. The onus is on the "schema owner" to ensure that the "schema user" knows where it is and if/when version changes to the schema have happened.
Single producer, Multiple consumers - at this point, the best place for the schema is definitely on the producing side, since there's one of that.
Multiple producers, single consumer - a this point, it's probably best for the schema to live in the consumer, since there's only one of that.
Multiple producers, multiple consumers - at this point, a dedicated "schema repository" is definitely the best choice.

Note that the solution for 4 would also work for 1,2, and 3. This could be done by simply having a "schema" Go module, that exposes the message(s) and possibly some common convenience functions for the code.

Putting it in a dedicated "neither the producer nor the consumer" module means that you'd only need to update the versioning when the schema changes, rather than always have to wonder if a producer/consumer version bump is also a format bump.

You probably do not (for anything than the most trivial cases) want to define the relevant struct types, marshalling and what-have-you both on the producer and on the consumer side.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a softwareengineering.stackexchange