Implementing set-based constraints in CQRS

https://stackoverflow.com/questions/4355860

08-10-2019
|

Question

I'm still struggling with what must be basic (and resolved) issues related to CQRS style architecture:

How do we implement business rules that rely on a set of Aggregate Roots?

Take, as an example, a booking application. It may enable you to book tickets for a concert, seats for a movie or a table at a restaurant. In all cases, there's only going to be a limited number of 'items' for sale.

Let's imagine that the event or place is very popular. When sales open for a new event or time slot, reservations start to arrive very quickly - perhaps many per second.

On the query side we can scale massively, and reservations are put on a queue to be handled asynchronously by an autonomous component. At first, when we pull off Reservation Commands from the queue we will accept them, but at a certain time we will have to start rejecting the rest.

How do we know when we reach the limit?

For each Reservation Command we would have to query some sort of store to figure out if we can accommodate the request. This means that we will need to know how many reservations we have already received at that time.

However, if the Domain Store is a non-relational data store such as e.g. Windows Azure Table Storage, we can't very well do a SELECT COUNT(*) FROM ...

One option would be to keep a separate Aggregate Root that simply keeps track of the current count, like this:

AR: Reservation (who? how many?)
AR: Event/Time slot/Date (aggregate count)

The second Aggregate Root would be a denormalized aggregation of the first one, but when the underlying data store doesn't support transactions, then it's very likely that these can get out of sync in high-volume scenarios (which is what we are trying to address in the first place).

One possible solution is to serialize handling of the Reservation Commands so that only one at a time is handled, but this goes against our goals of scalability (and redundancy).

Such scenarios remind me of standard "out of stock" scenarios, but the difference is that we can't very well put the reservation on back order. Once an event is sold out, it's sold out, so I can't see what a compensating action would be.

How do we handle such scenarios?

Solution

After thinking about this for some time it finally dawned on me that the underlying problem is less related to CQRS than it is to the non-trasactional nature of disparate REST services.

Really it boils down to this problem: if you need to update several resources, how do you ensure consistency if the second write operation fails?

Let's imagine that we want to write updates to Resource A and Resource B in sequence.

Resource A is successfully updated
The attempt to update Resource B fails

The first write operation can't easily be rolled back in the face of an exception, so what can we do? Catching and suppressing the exception to perform a compensating action against Resource A is not a viable option. First of all it's complex to implement, but secondly it's not safe: what happens if the first exception happened because of a failed network connection? In that scenario, we can't write a compensating action against Resource A either.

The key lies in explicit idempotency. While Windows Azure Queues don't guarantee exactly once semantics, they do guarantee at least once semantics. This means that in the face of intermittent exceptions, the message will later be replayed.

In the previous scenario, this is what happens then:

Resource A is attempted updated. However, the replay is detected so the state of A isn't affected. However, the 'write' operation succeeds.
Resource B is successfully updated.

When all write operations are idempotent, eventual consistency can be achieved with message replays.

OTHER TIPS

Interesting question and with this one you are nailing one of the pain points in CQRS.

The way Amazon is handling this is by having the business scenario cope with an error state if the items requested is sold out. The error state is simply to notify the customer by email that the items requested not currently in stock and the estimated day for shipping.

However - this does not fully answer your question.

Thinking of a scenario of selling tickets I would make sure to the tell the customer that the request they gave were a reservation request. That the reservation request would get processed as soon as possible and that they'll revive the final answer in a mail later. By alowing this some customers might get an email with a rejection to their request.

Now. Could we make this rejestion less painfull? Certainly. By inserting a key in our distributed cache with the percentage or amount of items in stock and decrementing this counter when ever an item is sold. This way we could warn the user before the reservation request is given, let's say if only 10% of the initial number of items is left, that the customer might not be able to get the item in question. If the counter is at zero we would simply refuse to accept any more reservation requests.

My point being:

1) let the user know that it's a request that they are making and that this might get refused 2) inform the user that the chances of success for getting the item in question is low

Not exactly an precise answer to your question but this is how I would handle a scenario like this when dealing with CQRS.

The eTag enables optimistic concurrency which you can use in place of transactional locking to update a document and safely handle potential race conditions. See remarks here http://msdn.microsoft.com/en-us/library/dd179427.aspx for more info.

Story might go something like this: User A creates an event E with max tickets of 2, eTag is 123. Due to high demand 3 users attempt to purchase the tickets at nearly the same time. User B creates a reservation request B. User C creates a reservation request C. User D creates a reservation request D.

System S receives reservation request B, reads event with eTag 123 and changes the event to have 1 remaining ticket, S submits the update including eTag 123 which matches original eTag so the update succeeds. The eTag is now 456. Reservation request is approved and user is notified was successful.

Another system S2 receives reservation request C at the same time as System S was processing request B, so it also reads event the event with eTag 123 changes it to 1 remaining ticket and attempts to update the document. This time however the eTag 123 does not match so the update fails with an exception. System S2 attempts to retry the operation by re-reading the document which now has eTag 456 and a count of 1 so it decrements this to 0 and resubmits with eTag 456.

Unfortunately for user C user, System S started processing user D's request immediately after user B and also read document with eTag 456 but because system S is faster than system S2 it was able to update the event with eTag 456 before system S2 so user D also successfully reserved his ticket. eTag is now 789

So System S2 fails again, gives it yet one more try but this time when it reads the event with eTag 789 it sees that there are no tickets available and thus denies user C's reservation request.

How you notify users that their reservation requests were successful or not is up to you. You could just poll the server every few seconds and wait for the reservation status to be updated.

Let's look at the business perspective (I deal in similar things - booking appointments on free slots) ...

The first thing in your analysis that strikes me as being off is that there's no notion of a reservable ticket/seat/table. These are the resources being booked.

In case of being transactional, you can use some form of uniqueness to ensure that a double booking does not happen for the same ticket/seat/table (more info at http://seabites.wordpress.com/2010/11/11/consistent-indexes-constraints). This scenario demands synchronous (but still concurrent) command processing.

In case of not being transactional, you can retroactively monitor the event stream and compensate the command. You can even give the end user an experience of waiting for the booking confirmation until the system knows for sure - i.e. after the event stream analysis - that the command completed and was or was not compensated (which boils down to "was the booking made? yes or no?"). In other words compensation could be made part of the confirmation cycle.

Let's step back some more ...

When billing is involved as well (e.g. online ticket sale), I think this whole scenario evolves into a saga anyways (reserve ticket + bill ticket). Even without billing, you'd have a saga (reserve table + confirm reservation) to make the experience credible. So even though you're only zooming in on just one aspect of booking a ticket/table/seat (i.e. is it still available), the "long-running" transaction isn't complete until I payed for it or until I confirmed it. Compensation will happen anyways, freeing up the ticket again when I abort the transaction for whichever reason. The interesting part now becomes how the business wants to deal with this: maybe some other customer would have completed the transaction had we given him/her the same ticket. In this case refunding might become more interesting when double booking a ticket/seat/table - even offering a discount on a next/similar event to compensate for the inconvenience. The answer lies in the business model, not the technical model.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow