Question

There is a great talk here about simulating partition issues in Cassandra with Kingsby's Jesper library.

My question is - with Cassandra are you mainly concerned with the Partitioning part of the CAP theorem, or is Consistency a factor you need to manage as well?

Was it helpful?

Solution

Cassandra is typically classified as an AP system, meaning that availability and partition tolerance are generally considered to be more important than consistency. However, real world systems rarely fall neatly into these categories, so it's more helpful to view CAP as a continuum. Most systems will make some effort to be consistent, available, and partition tolerant, and many (including Cassandra) can be tuned depending on what's most important. Turning knobs like replication factor and consistency level can have a dramatic impact on C, A, and P.

Even defining what the terms mean can be challenging, as various use cases have different requirements for each. So rather than classify a system as CP, AP, or whatever, it's more helpful to think in terms of the options it provides for tuning these properties as appropriate for the use case.

Here's an interesting discussion on how things have changed in the years since the CAP theorem was first introduced.

OTHER TIPS

CAP stands for Consistency, Availability and Partition Tolerance. In general, its impossible for a distributed system to guarantee above three at a given point.

Apache Cassandra falls under AP system meaning Cassandra holds true for Availability and Partition Tolerance but not for Consistency but this can further tuned via replication factor(how many copies of data) and consistency level (read and write).

For more info: https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlConfigConsistency.html

Interestingly it depends on your Cassandra configuration. Cassandra can at max be AP system. But if you configure it to read or write based on Quorum then it does not remain CAP-available (available as per definition of the CAP theorem) and is only P system.

Just to explain things in more detail CAP theorem means:

  1. C: (Linearizability or strong consistency) roughly means

If operation B started after operation A successfully completed, then operation B must see the system in the same state as it was on completion of operation A, or a newer state (but never older state).

  1. A:

“every request received by a non-failing [database] node in the system must result in a [non-error] response”. It’s not sufficient for some node to be able to handle the request: any non-failing node needs to be able to handle it. Many so-called “highly available” (i.e. low downtime) systems actually do not meet this definition of availability.

  1. P

Partition Tolerance (terribly misnamed) basically means that you’re communicating over an asynchronous network that may delay or drop messages. The internet and all our data centres have this property, so you don’t really have any choice in this matter.

Source: Awesome Martin kleppmann's work

The CAP theorem states that a database can’t simultaneously guarantee consistency, availability, and partition tolerance

Since network partitions are part of life, distributed databases tend to be either CP or AP

enter image description here

Cassandara was meant for AP but you can fine tune consistency at the cost of availability.

Availability : It was ensured with replicas. Cassandra typically writes multiple copies to different cluster nodes (generally 3). If one node is unavailable, data won't be lost.

Writing data to multiple nodes will take time because nodes are scattered in different location. At some point of time, data will become eventually consistent.

So with high availability preference, consistency is compramised.

Tunable consistency:

For read or write operation, you can mention consistency level. Consistency level refers to the number of replicas that need to respond for a read or write operation to be considered complete.

For non-critical features, you can provide less consistency level : say 1. If you think consistency is important, you can increase the level to TWO, THREE or QUORAM ( A majority of replicas)

Assume that you set the consistency level to high (QUORAM) for your critical features and majority of the nodes are down. In this case, write operation will fail.

Here Cassandra sacrificies availabiltiy for consistency.

Have a look at this article for more details.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top