Question

Dynamo-like databases (e.g. Cassandra) can enforce consistency by means of quorum, i.e. a number of synchronously written replicas (W) and a number of replicas to read (R) should be chosen in such a way that W+R>N where N is a replication factor. On the other hand, PAXOS-based systems like Zookeeper are also used as a consistent fault-tolerant storage.

What is the difference between these two approaches? Does PAXOS provide guarantees that are not provided by W+R>N schema?

Was it helpful?

Solution

Paxos is non-trivial to implement, and expensive enough that many systems using it use hints as well, or use it only for leader election, or something. However, it does provide guaranteed consistency in the presence of failures - subject of course to the limits of its particular failure model.

The first quorum based systems I saw assumed some sort of leader or transaction infrastructure that would ensure enough consistency that you could trust that the quorum mechanism worked. This infrastructure might well be Paxos-based.

Looking at descriptions such as https://cloudant.com/blog/dynamo-and-couchdb-clusters/, it would appear that Dynamo is not based on an infrastructure that guarantees consistency for its quorum system - so is it being very clever or cutting corners? According to http://muratbuffalo.blogspot.co.uk/2010/11/dynamo-amazons-highly-available-key.html, "The Dynamo system emphasizes availability to the extent of sacrificing consistency. The abstract reads "Dynamo sacrifices consistency under certain failure scenarios". Actually, later it becomes clear that Dynamo sacrifices consistency even in the absence of failures: Dynamo may become inconsistent in the presence of multiple concurrent write requests since the replicas may diverge due to multiple coordinators." (end quote)

So, it would appear that in the case of quorums as implemented in Dynamo, Paxos provides stronger reliability guarantees.

OTHER TIPS

Yes, Paxos provides guarantees that are not provided by the Dynamo-like systems and their read-write quorums. The difference is how failures are handled and what happens during a write. After a successful write, both kind of systems behave similarly. The data will be saved and available for reading afterwards (until overwritten or deleted) and so on.

The difference appears during a write and after failures. Until you get a successful answer from W nodes when writing something to the eventually consistent systems, then the data may have been written to some nodes and not to others and there is no guarantee that the whole system agrees on the current value. If you try to read the data back at this point, some clients may get the new data back and some the old data back. In other words, the system is not immediately consistent. This is because writes aren't atomic across nodes in these systems. There are usually mechanisms to "heal" an inconsistency like this and "eventually" the system will become consistent again (i.e. reads will once again always return the same value, until something new is written). This is the reason why they are often called "eventually consistent". Inconsistencies can (and will) appear, but they will always be dealt with and reconciled eventually.

With Paxos, writes can be made atomic across nodes and inconsistencies between nodes are therefore possible to avoid. The Paxos algorithm makes it possible to guarantee that non-faulty nodes never disagree on the outcome of a write, at any point in time. Either the write succeeded everywhere or nowhere. There will never be any inconsistent reads at any point (if it's correctly implemented and if all the assumptions hold, of course). This comes at a cost, however. Mainly, the system may need to delay some requests and be unavailable when for example too many nodes (or the communication between them) aren't working. This is necessary to assure that no inconsistent replies are given.

To summarize: the main difference is that the Dynamo-like systems can return inconsistent results during writes or after failures for some time (but will eventually recover from it), whereas Paxos based systems can guarantee that there are never any such inconsistencies by sometimes being unavailable and delaying requests instead.

Paxos and the W+R>N quorum try to solve slightly different problems. Paxos is usually described as a way to replicate a state machine, but in fact it is more of a distributed log: each item written to the log gets an index, and the different servers eventually hold the same log items + their index. (Replicated state machine can be achieved by writing to the log the inputs to the state machine and each server replays the state machine on the agreed inputs according to their index). You can read more about Paxos in a blog post I wrote here.

The W+R>N quorum solves the problem of sharing a single value among multiple servers. In the academia it is called "shared register". A shared register has two operations: read and write, where we expect the read to return the value of the previous write.

So, Paxos and the W+R>N quorum live in different domains, and have different properties (e.g., Paxos saves an ordered list of items). However, Paxos can be used to implement a shared register, and a W+R>N quorum can be used to implement a distributed log (although, very inefficiently).

Saying all the above, sometimes the W+R>N quorums aren't implemented in their "fully robust" way, as it will require more than one communication round. Thus, in systems that want low latency, it is possible that their implementation of W+R>N quorums provide weaker properties (e.g., conflicting values can co exist).

To sum up, theoretically, Paxos and the W+R>N can achieve the same goals. Practically, it would be very inefficient, and each one is better for something slightly different. Even more practically, W+R>N isn't always implemented fully, thus scarifying some consistency properties for speed.

Update: Paxos supports a very general failure model: messages can be dropped, nodes can crash and restart. The W+R>N quorum scheme has dfferent implementations, many of which assume less general failures. So, the difference between the two also depends on the assumption on the possible failures that are supported.

There is no difference. The definition of a quorum says that any two quorums' intersection is not empty. Simple majority quorum is an example NOT a definition. Take a look at Dr. Lamport's later paper "Vertical Paxos", where he gave some other possible configuration of quorums.

Multi-decree paxos protocol (AKA Multi-Paxos), in steady state it's just two phase commit. Ballot number changes are only needed when the leader fails.

Zookeeper's replication protocol (ZAB) , and RAFT are all based on Paxos. The differences are in fault-detection and transition after a leader fails.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top