Event sourcing, replaying and versioning

https://softwareengineering.stackexchange.com/questions/310176

12-12-2020
|

Question

I am designing a system that uses Event Sourcing, CQRS and microservices. I am lead to understand this isn't an uncommon pattern. A key feature of the service needs to be the ability to rehydrate/restore from a system of record. Microservices will produce commands and queries on a MQ (Kafka). Other microservices will respond (events). Commands and queries will be persisted on S3 for purpose of auditing and restoring.

The current thought process was that, for the purposes of restoring the system, we could extract the event log from S3 and simply feed it back into Kafka.

However, this fails to acknowledge changes in both producers and consumers over time. Versioning at the command/query level seems to go some way toward solving the problem but I can't wrap my head around versioning consumers such that I could enforce that when a command, during a restore, is received and processed, it's the exact same [version of the] code that's performing the processing as it was the first time the command was received.

Are there any patterns I can use to solve this? Is anyone aware of other systems that advertise this feature?

EDIT: Adding an example.

A 'buyer' sends a 'question' to a 'seller' on my auction site. The flow looks as follows: UI -> Web App: POST /question {:text text :to seller-id :from user-id} Web App -> MQ: SEND {:command send-question :args [text seller-id user-id]} MQ -< Audit: <command + args appended to log in S3> MQ -< Questions service: - Record question in DB - Email seller 'You have a question'

Now, as a result of a new business requirement, I adjust the 'Questions service' consumer, to persist a count of all unread questions. The DB schema is changed. We have had no notion of whether or not a question was read by the seller, until now. The last line becomes:

MQ -< Questions service: - Record question in DB - Email seller 'You have a question' - Increment 'unread questions count'

Two commands are issues, one before the change, one after the change. The 'unread questions count' equals 1.

The system crashes. We restored by replaying the commands through the new code. At the end of the restore, our 'unread questions count' equals 2. Even though, in this contrived example, the result is not a catastrophe, the state that has been restored is not what it previously was.

Solution

First, it is important to understand and be able to leverage the difference between Commands and Events.

As this question succinctly points out, Commands are things we would like to happen, and Events are things that have already happened. A command does not necessarily result in a significant event in the system, but it usually does. For example, a send message command may be rejected, in which case no event happens (typically an error would not be considered an event in this sense, though we may still choose to log it in a diagnostic log). Now, if the send message command is accepted, the message sent event occurs, and event details could describe the sender, the receiver, and the content.

When we talk about the system state, we are actually discussing not a culmination of commands, but of events. Only events reflect changes of state in the system. To draw from a life example, suppose I go to the local Publix supermarket and buy a Florida lottery ticket. The command was "Buy Ticket" and the event was "Ticket issued." My next command then is to the lottery to draw my numbers for the PowerBall. The lottery is going to ignore my command (but I have no knowledge), and the event "PowerBall numbers chosen" takes place irrespective of my wishes. If my numbers match, the event "Jackpot won" happens to me (and I think my command was heard). If not, I realize my command was ignored.

From a historical perspective, the lottery is only interested in a subset of events. The lottery only cares that (a) a ticket was issued, (b) the numbers were chosen, and (c) the jackpot was won. Those are the items of interest. The act of purchasing the ticket, wanting to win, etc. are all irrelevant, as is what I do with my ticket after I lose. While the real world does change for mundane events, we only need to record those events which are significant to our system.

In theory, under an event-sourcing technique, a stream of events may be replayed from the beginning of time to arrive at the current state. This relies upon the assumption that the underlying system conditions are constant and deterministic. However, these assumptions are not valid in many systems. The data associated with an event, as well as the types of events we are interested in, may change as our computer software evolves. In addition, it can be computationally expensive to re-compute the current state in response to every query. For this reason, snapshots of the system state are often taken to represent known points in time, which most recent events can then be added to.

While it is still possible to replay an event stream across multiple versions, the amount of human effort involved in doing so is likely to be cost-prohibitive. Unless there is a justifiable reason to design that capability into the system, you are better off building your system to utilize snapshots.

Example in Question

In the example given in the question, the architecture is not truly event-based; it is command-based. Replaying commands creates the system state. This is an anti-pattern and should be fixed. Instead, the primary events are:

Buyer asks question
Seller responds to question

Each of these events can be "replayed" to give the current state. For example, in the act of asking a question, the system behavior might be to email the seller and increment the unanswered question counter. This behavior can be changed; however, the fact that the question was asked does not. Similarly, the system might decrement the unanswered question counter when the seller responds. This behavior is changable, but the fact that the seller responded is not.

Most event-sourcing systems would dynamically compute the count of unanswered questions by replaying the specific event stream in response to a query.

OTHER TIPS

Commands and queries will be persisted on S3 for purpose of auditing and restoring.

For auditing, sure. For restoring ? That's weird, and likely to cause you headaches.

If you are going to be event sourcing, you want to be rehydrating state from events (things that happened in the past) not commands. This saves you from most of the problems associated with changes to command implementation -- you only need to deal with the persisted state changes.

Versioning is still a concern. In particular, you want to make sure that your persisted events are as supple as possible (DTOs representations, rather than direct serializations of the concepts in your domain). When reading events from the store, you have an opportunity to update them as necessary prior to applying them to the rehydrating state.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange