Transaction Management in the R* Distributed Database Management System (1986)

This paper describes the vanilla two-phase commit protocol, extends it to a hierarchical two-phase commit protocol, and then introduces three protocol optimizations to take advantage of presumed aborts, partially/fully read-only transactions, and presumed commits. It also discusses how R* detects deadlocks and chooses a victim to kill. Two-phase commit is one of those things where the more you understand it, the more you understand that you don't understand it. Good luck!

Two-Phase Commit

Two-phase commit is an atomic commitment protocol. The goal of the protocol is to have a set of nodes either all commit or all abort. One node is designated as the coordinator and all other nodes are designated as the subordinates. The protocol proceeds in two rounds:

  1. In the first round, the coordinator sends prepare messages to all subordinates. Every subordinate returns either a vote yes to commit or a vote no to abort.
  2. In the second round, the coordinator sends a commit message to all subordinates if they all voted yes. Otherwise, if any subordinate voted no, the coordinator sends an abort message to all the subordinates that voted yes.

In Figure 1, Figure 2, and Figure 3, we illustrate a sequence of messages sent between a coordinator (left) and subordinate (right) assuming no failures. In Figure 1, all subordinates vote yes. In Figure 2, the subordinate votes yes but another subordinate votes no. In Figure 3, the subordinate votes no.

Now, let's discuss what the coordinator does when it crashes and restarts.

And the subordinate:

Figure 1. Subordinate votes yes; transaction commits.
Figure 2. Subordinate votes yes; transaction aborts.
Figure 3. Subordinate votes no; transaction aborts.

Hierarchical 2PC

In hierarchical two-phase commit, nodes are organized in a tree, and each node can only talk to its parent and its children. The root of the tree is the coordinator, the leaves of the tree are subordinates, and the internal nodes play the role of both. The root and the leaves behave as usual. The inner nodes forward prepare messages down the tree and aggregate votes up the tree. That is, if all of an inner node's children vote yes, the node forwards a yes upward. If any of the children vote no, the inner node forwards a no upward and also sends abort messages downward. When an inner node receives a commit or abort message, it force writes it, sends an acknowledgement upward, and forwards the abort or commit downward.

Presumed Abort

Presumed abort recognizes that if a coordinator doesn't know about a transaction, it presumes that it aborted. This allows us to avoid sending some messages and avoid force writing some log entries.

Coordinator action on restart:

Subordinate action on restart:

Figure 4. Subordinate votes yes; transaction commits.
Figure 5. Subordinate votes yes; transaction aborts.
Figure 6. Subordinate votes no; transaction aborts.

Read-Only Subordinates

If any subordinate only performed reads, then it responds to prepare messages with a read vote and then completely forgets about the transaction without logging anything. If a coordinator receives all read votes, it also forgets about the transaction and logs nothing.

Presumed Commit

Presumed abort acks commits but not aborts. Ideally, commits are more common than aborts, so we'd like to instead ack aborts but not commits. In order to do so, the coordinator will have to presume that a transaction is committed if it has forgotten about it.

Coordinator action on restart:

Subordinate action on restart:

Figure 7. Subordinate votes yes; transaction commits.
Figure 8. Subordinate votes yes; transaction aborts.
Figure 9. Subordinate votes no; transaction aborts.

TODO: deadlock detection section.