Where should I store stream wide metadata in stream processing?

https://softwareengineering.stackexchange.com/questions/334564

31-12-2020
|

Question

I am building a stream processing architecture and was wondering what to do with metadata about the stream. For instance, every message of data coming from a source has the same attribution, as it passes along from source to processing, enriching and eventually to a sink that attribution data does not change but is relevant to every message.

I'm planning on using Apache Kafka for the message queue, just for reference.

What do I do with this metadata? Do I store the full attribution in a database and just pass along the id of that database entry in every message? Or is it better to actually put the attribution into every single message as it is passed along the message queue? Or is there a better or more standard option?

Solution

If metadata is used in the processing of a message, then it should be part of that message. That prevents data and metadata from getting out of sync, including the metadata going missing.

If the metadata isn't used, is there a good reason to retain it?

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange