Imagine we have the following relation, divided into eight directories.
+----+
| D1 |
| D2 |
| D3 |
| D4 |
| D5 |
| D6 |
| D7 |
| D8 |
+----+
We bundle the directories into tablets:
T1: D1, D6
T2: D3, D5
T3: D2, D8
T4: D4, D7
Each tablet is then replicated, say three ways.
R1: [D1, D6] R4: [D3, D5] R7: [D2, D8] R10: [D4, D7]
R2: [D1, D6] R5: [D3, D5] R8: [D2, D8] R11: [D4, D7]
R3: [D1, D6] R6: [D3, D5] R9: [D2, D8] R12: [D4, D7]
Imagine we have five zones, with two servers per zone. We can split up the tablet replicas like this (stars signify the master):
Zone 1: M1: [R1], M2: [R10*]
Zone 2: M3: [R7], M4: [R4*]
Zone 3: M5: [R8], M6: [R5, R11]
Zone 4: M7: [R9*, R12], M8: [R2*]
Zone 5: M9: [R6], M10: [R3]
(key: string, value: timestamp) -> string
mapping stored as roughly an LSM-tree on Colossus. Note that a Spanner tablet is similar to but not quite the same as a Bigtable tablet.The leader of a Paxos group also has a transaction manager for two-phase commit. The leader is called a participant leader, the other members are participant slaves. One Paxos group is designated as the coordinator. The leader of this group is called the coordinator leader; the rest are coordinator slaves.
Paxos Group 1 (participant)
+------------------------------------------------------------------+
| [participant leader]<->[participant slave]<->[participant slave] |
+------------------------------------------------------------------+
^
|
v
Paxos Group 2 (participant)
+------------------------------------------------------------------+
| [participant leader]<->[participant slave]<->[participant slave] |
+------------------------------------------------------------------+
^
|
v
Paxos Group 3 (coordinator)
+------------------------------------------------------------------+
| [coorindator leader]<->[coorindator slave]<->[coorindator slave] |
+------------------------------------------------------------------+
TT.now()
which returns an interval of time that is guaranteed to contain the true time.latest
time produced by a TT.now()
call made when the prepare request was received.latest
from TT.now()
and read at that timestamp.