The CAP theorem dictates that in the face of network partitions, replicated data stores must choose between high availability and strong consistency. In this 12 year retrospective, Eric Brewer takes a look back at the CAP theorem and provides some insights.
The CAP theorem is misleading for three reasons.
After a node experiences a delay when communicating with another node, it has to make a choice between (a) aborting the operation and sacrificing consistency of (b) continuing with the operation anyway and sacrificing consistency. Essentially, a partition is a time bound on communication. Viewing partitions like this leads to three insights:
Systems should take three steps to handle partitions.
The operations which a node permits during a partition depends on the invariants it is willing to sacrifice. For example, nodes may temporarily violate unique id constraints during a partition since they are easy to detect and resolve. Other invariants are too important to violate, so operations that could potentially violate them are stalled.
Once a system recovers from a partition it has to
Sometimes a system is unable to automatically make the state consistent and depends on manual intervention. Sometimes, the system can automatically restore the state because it carefully rejected some operations during the partition. Other systems can automatically restore consistency because they use clever data structures like CRDTs.
Some systems, especially those which externalize actions (e.g. ATMs), must sometimes issue compensations (e.g. emailing users).