Solving Event Dependence in Event Driven Systems

https://softwareengineering.stackexchange.com/questions/394995

28-02-2021
|

Question

There are 60 Million Shipments per day. Each shipment has about 50 metrics to be calculated. Each metric is calculated based on a type of the event(Let's say event_1 has the required information to calculate metric_1, event_2 .. metric_2 and so on). All the events are independent of each other apart from one dependency, a single event(let's say event_1) which has vital information required to process each of the other events.

The current design:

(In Order)Scenario 1:event_1 arrives first, we calculate metric_1 and store the vital information required to process other events in DynamoDB. Other events(event_4, event_2,....) arrive and are processed by accessing the information from DynamoDB.

(Out of Order)Scenario 2: event_3 arrives first, system checks for required information in DynamoDB and fails, the system places the event in the dead letter queue to be retried after a period of time. One event_1 arrives and is processed, the other events go through.

Is using a data store and retry mechanism the right approach to resolve the dependency on the base event(event_1)?

Are there better approaches/patterns to solve the event dependency problem?

Additional Context: Although I believe this information is irrelevant, I am giving it anyway if it helps. Source of Events: SNS topics, Event Processing: SNS->SQS->Lambda, Data Store: DynamoDB, Metrics are stored in RedShift.

Solution

Disconnect Event Processing from Metric Processing.

The events inform a Model. The Model is responsible for performing the appropriate calculations either on demand, or at the relevant point in time when the data is available.

Eg:

event_3 is witnessed by the system.
the system informs the appropriate model of the shipment.
- The model determines that the model has transitioned into a state where more work is required. It performs or schedules this somehow. This additional work may change the model state again (perhaps several times).
- The model determines that the model has transitioned into a state where data has to be stored for later use. It updates the datastore with the updated model.
- If processing of the event fails, a retry system might be appropriate.
the system moves onto the next event.

You may want to have a batch reconciliation process for events that get missed, dropped, or fail to be processed. Perhaps even an alerting system for models that have stayed in an incomplete state for too long.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange