Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services

In this lecture, we'll define linearizability: a formal consistency model that's often used synonymously with strong consistency. We'll then present and prove the infamous CAP theorem which tells us that partition-tolerant distributed systems have to choose between strong consistency and availability. In other words, we can't have our strongly consistent cake and eat it with high availability too.


In the last lecture, we established (rather informally) that a distributed storage system is strongly consistent if it behaves indistinguishably from a storage system running on a single computer. In this section, we'll fortify our understanding of strong consistency by taking a look at linearizability: a formalism of strong consistency initially proposed by Maurice Herlihy and Jeannette Wing in 1990.

In the last lecture, the only storage system we looked at was a key-value store. In this lecture, we'll consider an even simpler storage system: a register. A register stores a single value. Clients can issue

An example interaction between a client (a) and a register (s) running on a single computer is given below. The client first writes the value 7 into the register and then reads it out.

From now on, let's assume that the register running on a single computer can instantaneously process a request and send a response the exact moment the request arrives. With that assumption, let's take a look at how two clients (a and b) might interact with such a register when the network starts to delay and quicken the delivery of messages.

The execution begins as client a sends a very slow w(9) request to the register. Before the w(9) request has a chance to make it to the register, client b sends a very speedy w(4) request. The w(4) request arrives at the register and the response arrives back at client b before client a's request even has a chance to reach the register. Finally, client a's write request arrives at the register, writes a value of 9, and returns back to client a.

To make things clearer, we've drawn the contents of the register at the moment each request arrives at the register and takes effect. When the w(4) request takes effect, the register has value 4; when the w(9) request takes effect, the register has value 9.

Note that from our god's eye view, we know exactly when each request arrives at the register. Unfortunately, clients don't enjoy such a luxury. A client only knows when it sends its request and when it receives the corresponding response. It doesn't know exactly when the request arrives at the register.

Let's take a closer look at an execution from the perspective of the clients and note something rather profound. We may not know when requests arrive at the register, but we can guess! That is, for each request, we can guess a time—between when the request is sent and when a response is received—that the request could have arrived at the register.

Play around with the following interactive visualization of an execution in which client a sends a w(3) request and then client b sends a r() request. You can click (and drag) on the thick colored bars to place your guess as to when each request arrived at the server.

In the previous execution, all of your guesses could have been correct. However, this is not always the case. Consider the following execution in which client a and client b concurrently issue write requests after which client a issues a read request. Note that some of your guesses cannot possibly be correct.

More specifically, any guess in which client b's w(8) request follows client a's w(2) request cannot possibly be correct because after both the writes, client a reads a value of 2. This means that even though we don't know exactly when the requests arrive at the register, we do know that the w(2) request must arrive after the w(8) request.

For read requests, note that we've shaded the value of the register red whenever the register's value is inconsistent with the value returned by the read request. For example, client a's read request above returns 2, so the final state of the register is shaded red whenever its value is anything other than 2 (e.g. 8).

If any of the registers in our execution are shaded red, then our guess can not possibly be correct. This is bad. If all of the registers are green, then our guess could be correct. This is good. Try to find a potentially correct guess for the following executions.

At this point, we've seen an execution in which all guesses were potentially correct. We've also seen a few executions in which some guesses were potentially correct and some were definitely incorrect. Now try to find a potentially correct guess for the following execution.

Alas, all guesses are incorrect; there is no potentially correct guess! That is, there does not exist a way for us to guess the moment that each request arrives at the register such that the guess is consistent with the responses of all the read requests. Thus, this execution could not have taken place with a register running on a single computer. Instead, we have deduced that the register must be replicated on multiple computers. In other words, we were able to distinguish the behavior of the register from the behavior of a register running on a single computer!

Surprise, this is linearizability!

For a given execution, if there exists a potentially correct guess, then we say the execution is linearizable. If all guesses are definitely incorrect, then we say the execution is not linearizable. Similarly, a linearizable register is one that only allows linearizable executions.

If a (potentially replicated) register is linearizable, then every interaction with it is indistinguishable from an interaction with a register running on a single computer in which the requests arrive at the single-computer register according to one of our potentially correct guesses. If a replicated register is not linearizable, then there are some interactions with it that could not possibly occur with a single-computer register.

Thus far, we've only discussed linearizability in the context of a register, but linearizability can be extended quite naturally to deal with many other types of objects (e.g. queues, sets). This generalization of linearizability, as well as a full formalization, can be found in Herlihy and Wing's 1990 paper: Linearizability: A Correctness Condition for Concurrent Objects.

Consistency, Availability, and Partition-Tolerance

The CAP theorem states that a partition-tolerant replicated register cannot be both consistent and available. In order to understand (and prove) the CAP theorem, we first have to define the words consistent, available, and partition-tolerant.

Consistent. The CAP theorem uses the word consistent as a synonym for linearizable. Thanks to the last section, we already understand what it means for a replicated register to be linearizable.

Available. A replicated register is available if every request sent to a non-failed replica eventually produces a response. In other words, whenever a non-failed replica receives a request (e.g. w(9)) from a client, it eventually has to send a response (e.g. ok) back to the client. The replica is allowed to take as long as it wants before responding to the client, but it is not allowed to ignore the request indefinitely.

Partition-tolerant. The CAP theorem assumes that, at any time, replicas can be temporarily partitioned from one another and that any messages sent between the two partitions are lost. Effectively, this means that the network can drop arbitrary messages sent from any replica to any other replica at any time. If a replicated register behaves correctly despite the possibility of arbitrary message loss, we say it is partition-tolerant.

Proving The CAP Theorem

To reiterate, the CAP theorem states that a partition-tolerant replicated register cannot be both consistent and available. Now that we've established the definitions of consistent, available, and partition-tolerant, we're ready to prove the theorem! Note that even though the CAP theorem is hugely far-reaching and influential, we think you'll find the proof remarkably simple.

Assume for contradiction that there exists a consistent, available, and fault-tolerant replicated register. Also assume the replicas are partitioned in two. For simplicity (and without loss of generality), consider a replicated register with exactly two replicas (s1 and s2). Assume the register has an initial value of 0 and consider the following execution.

Our client (a) begins by issuing a w(9) request to s1. When s1 receives the request, it repeatedly attempts to send a message to s2, but alas the partition drops all the messages. By our assumption of availability, s1 eventually writes 9 to the register and returns a response to the client.

Then, the client sends a r() request to s2: the other replica. As before, s2 tries repeatedly to send a message to s1, but the partition prevents it. Again by the assumption of availability, s2 must eventually return to the client. Since s2 has not been able to communicate with s1 at all, the value of its register is still the initial value of 0.

The client wrote 9 to the register but then read back 0. This execution is not linearizable, and thus the register is not linearizable. This violates our assumption of consistency and proves the theorem!

Implications of the CAP Theorem

The CAP theorem proves that there is a fundamental trade-off between strong consistency and availability. A distributed system must choose between being strongly consistent and being highly available.

Shortly after Eric Brewer proposed the CAP theorem in his 2000 PODC keynote, there was a Cambrian explosion of systems that chose high availability over strong consistency: Amazon's Dynamo, Google's BigTable, and Facebook's Cassandra to name a few. More recently, there has a resurgence of systems that choose strong consistency over availability: Google's Spanner, Stanford's RAMCloud, and Cornell's Hyperdex for example.

Furthermore, corollaries of the CAP theorem show that there are also fundamental trade-offs between consistency and latency. Even when the network is not partitioned, strongly consistent storage systems run with more latency that their weakly consistent counterparts.

The trade-offs detailed by the CAP theorem and its corollaries are the reason that there are so many varying consistency models (including the six we saw last lecture). Each consistency model represents one point in a complex spectrum trading off consistency, availability, latency, complexity, and so on. With this smorgasbord of consistency models, systems are free to choose the one that suits their needs best.