Chubby is Google's distributed lock service, implemented using consensus, that allows clients to synchronize and agree on values. Chubby is like a file system in which clients read and write entire files, but it also supports advisory locks, event notification and more. Chubby's main goals are reliability, availability, and easy-to-understand semantics. It's secondary goals are throughput and storage capacity.
Rationale. Chubby is a centralized lock service that internally uses Paxos. There are a number of advantages to implementing a lock service rather than providing a Paxos library that clients can use directly:
Moreover, there are a number of other features that are useful to have in Chubby including event notification, consistent file caching, and security
Also note that Chubby is designed for coarse, as opposed to fine, grained locking. That is to say, clients hold locks for minutes to hours instead of seconds or less. This reduces the load on Chubby and makes clients less susceptible to Chubby crashes. Finer grained locking can also be implemented on top of Chubby.
System Structure. Chubby consists of a client library linked into applications and a small cluster (typically 5) of servers called replicas. Replicas use consensus to elect a master for the duration of a master lease which can be renewed by the master. The master handles reads and writes which are copied to the replicas using consensus. DNS is used to find replicas, and replicas forward clients to the current master. If a replica is down for too long (e.g. multiple hours), then it is replaced, the DNS entries are updated to point at the new server, the new server receives data from backups and other replicas, and finally the master introduces it into the cluster.
Files, directories, and handles. Files in Chubby are identified by a path ls/cell/dir1/dir2/.../dirn/filename
where ls
(lock service) identifies a Chubby file, cell
is the name of a cell, and the rest of the path is a usual path. This naming structure integrates well into Google's existing file libraries and tools. Chubby makes a couple of simplifying assumptions about files:
Files can act as advisory locks, and file ACLs are themselves stored as files. Moreover, files can be permanent or ephemeral. An ephemeral file is deleted when no clients have it open; an ephemeral directory is deleted when it is empty.
Each node maintains four numbers which increase over time:
Files also include 64 bit checksums.
Locks and sequencers. Using distributed locks can be tricky, especially when messages are delayed and locks are rapidly released and acquired. Each file in Chubby can act as an advisory reader-writer lock. Lock owners may request a sequencer which is an opaque byte-string with a name, mode, lock generation number etc. A client passes the sequencer to a server, like a file server, which can check the validity of the sequencer with Chubby. There is also a hackier way to prevent anomalies in which locks cannot be re-acquired since the last release until a lock-delay has passed.
Events. Chubby clients can subscribe to events including:
API. The Chubby API includes Open
, Close
, Poison
, GetContentsAndStat
, GetStat
, ReadDir
, SetContents
(with compare-and-swap semantics if need be), Delete
, Acquire
, TryAcquire
, Release
, GetSequencer
, SetSequencer
, and CheckSequencer
calls.
Caching. To reduce read load on Chubby, clients maintain a consistent write-through cache. The master tracks what each client has cached and issues leases and invalidations. Chubby uses a simple algorithm to keep caches consistent: when a client issues a write, the server sends invalidations to all clients who have the file cached. After all client caches have been invalidated, either by an acknowledgement or a lease expiration, the write is written.
Sessions and keep-alives. Chubby sessions between clients and servers are maintained by periodic KeepAlive RPCs. Each session has a lease timeout before which the session is guaranteed not to expire, and leases are periodically renewed in three situations:
Masters do not respond to KeepAlive RPCs immediately. Instead, they wait until just before the lease expires before responding. They also sent cache invalidations and event notifications as part of the RPC. Clients respond to KeepAlive RPCs immediately. Clients also conservatively estimate their lease timeouts with some assumptions on clock skew and packet delivery times. When a session expires, clients enter a 45 second grace period where they drop their cache. If they reconnect to a master, things continue as usual. Otherwise, they deliver an error to the user.
Fail-over. If a master fails over, the client may reconnect to the new master during its grace period. The new master follows the following procedure:
DB. Originally, Chubby used BerkeleyDB (BDB), but BDB's replication support was young. Moreover, Chubby didn't need all of the functionality. Ultimately, Chubby threw out BDB and implemented a persistence layer itself.
Backup. Periodically, Chubby masters back up their state in another cell in GFS.
Mirroring. Chubby's event notifications make it easy to mirror directories.