Distributed Systems: Keeping timestamp consistency between different nodes

https://stackoverflow.com/questions/20160631

04-08-2022
|

Question

Context

We have a distrubuted system. We emit events from one of those systems which are read from another system for report generation.

Logical order is ensured by the fact that even if the emitter system has N nodes there is a finite state machine underlined which makes impossible to have concurrent emission of an event for one aggregate. These events are marked with a timestamp. N nodes could not always be on synch about the time.

We care so much about timestamp because the down-stream system which generates reports needs quite always a timestamp because "Reporting people" care about this kind data to check things are going the right way.

The problem

The fact 2 nodes could have a little discrepancy is making us thinking. Let's imagine the next example.

The logical order of the events is this:

Event 1 => Event 2 => Event 3

But in the Database we could have this situation:

-------------------------------------------
|  Name   |  TimeStamp  |  Logical Order  |
-------------------------------------------
| Event 1 |      2      |        1        |
| Event 2 |      1      |        2        |
| Event 3 |      3      |        3        |
-------------------------------------------

Has you can see, Event 2 is logically happened after the Event 1 but their timestamp could not be on synch.

Ok, this is not going to happen every 2 seconds but it could happen because the timestamp comes from different nodes. And from a Reporting point of view this is an anomaly.

Possible solutions

Make Reporting people aware of the possible problem. We are not able to have one global source of time (NTP is not a good solution for some good reasons) so if there are discrepancies of a very small amount of time is not a problem and it means that "this event is happened around that time".
Ensure timestamp consistency checking that the next event in the logical flow could not have a timestamp which is less that the previous event making them equals. This is not the truth but keeps the flow consistent even from a non developer point of view.

Have you got experiences on this topic?

Solution

If you can ensure causality relationship and have a partial order, i don't see many problems in presenting a "useful business representation" with modified timestamp. I think that underlying distributed architecture is out of context for business domain.

They probably understand the system as a whole, and forcing a shift in their mental model may cause some friction.

On the other side i would not normalize timestamp on the log, you can use that to track clock drifts between subsystems.

OTHER TIPS

Based on your question, I assume the timestamp is being generated before the event is read by the finite state machine. I'd suggest you to sort your events by timestamp instead of using the logical order. When working on distributed systems, it's recommended to have one, and just one, way to sort events.

With regard to distributed, sequential ids generation, I recommend you to take a look at this answer and to snowflake, which is mentioned in the previous link. The later provides a distributed service that you can use as a centralized marker generator. The ids generated by snowflake are a composition of: timestamp, worker number and sequence number.

TL;DR

If the timestamp is reliable enough to guarantee events order, I'd suggest you to use that one instead of the logical order, which I'm assuming is generated after the timestamp was.

Hoe this helps

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow