Pergunta

Folks, I am currently learning about distributed data systems via the book "Designing Data-Intensive Applications".

I think I have a pretty strong understanding about how version numbers in a single replica system allow the server to detect concurrent writes*. The author starts with this example because once you understand the single replica system, expanding that understanding to a multi-leader or leaderless replicated system is supposed to be obvious, but it is not obvious to me at all.

How do version number in a system where multiple replicas can handle write requests work? In other words, what are version vectors?

* In a single replica system, each write is accompanied by a version number. This version number is the version of the data that the write is based off of. If a write is based on Version 1 of the data for that key, and Version 2 already exists, we know that the incoming write is concurrent with Version 2. The incoming write can only overwrite data that was in Version 1, since it does not know about the data in Version 2. For example, Version 1 is [eggs], Version 2 is [eggs] and [milk]. The incoming write wants to update this key to [eggs, bacon]. Version 3 of this key will become [eggs, bacon] and [milk]. The incoming write cannot overwrite [milk] since it didn't even know that [milk] was a value in the key.

Foi útil?

Solução

Version vectors are a way of each node in a cluster communicating its local version number to all other nodes in a cluster. They are also known as vector clocks.

In essence, when a node A passes a message to another node B, A includes in that message what it knows about the version number of all nodes in the cluster. Because it contains values for all other nodes it is an array, or vector, of version numbers. Node B uses this information to update what it knows about the version numbers across the cluster. It can use this information to work out a global ordering of events from across the whole cluster.

In turn node B includes this updated information in messages it sends to other nodes, including to node A.

Search for vector clock. There are a great many explanations, both academic and practical.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a dba.stackexchange
scroll top