Understanding the "P" in "CAP" with DDBMSes

Question 1

CAP theorem says that in a distributed system, you cannot have universal correctness, availability, and partition tolerance during failures. Correctness means data read from any node has no conflicting values at any other node. Availability means that all healthy nodes can be used by clients. Partition tolerance means that the system can be split into subsets which cannot communicate with each other and still function.

Say you have 3 machines. One of them is unable to contact the others, or in other words, the cluster is split into 2 partitions. If the system can handle this scenario, then it is partition tolerant. However, you must either give up total correctness or total availability:

Drop correctness: All nodes remain up, but the split off node and the remaining cluster nodes may contain conflicting data, sometimes known as split brain.

Drop availability: One of the partitions goes offline. This protects data integrity, since any successful read will not have a conflicting value anywhere else.

From a database system perspective, this means you must have different strategies for dealing with failure. A database that can't handle partition failures means that if any node goes down, the behavior is undefined. A database that sacrifices correctness during failures will force the application to deal with consistency issues when the failure is resolved, but more nodes can remain available. A database that gives up availability will allow the application logic to assume that the data is always consistent, but some otherwise healthy nodes will be inaccessible during the failure.

Question 2

My understanding of CAP is that you cannot reliably have all three desirable attributes all the time, and must chose your priorities. I agree it isn't the easiest to get your head around with the terminology used, but this article from Eric Brewer himself does a good job of explaining it. http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed

To answer your question directly, if you choose to partition data, then at some level you will need to trade off consistency or availability. If you split data between A and B, and they lose connection to each other, then either you block updates (-availibity) or let them update independently (-consistency)