Bigtable: A Distributed Storage System for Structured Data (2008)

Bigtable is a non-relational database developed at Google that is distributed across 1000s of machines storing petabytes of data. Bigtable is built using a lot of other Google technology (e.g. GFS, Chubby, Borg, SSTable) and can achieve both high throughput and low latency.

Data Model and API

In short, Bigtable is a "sparse, distributed, persistent multi-dimensional sorted map" of type: (key: string * column: string * timestamp: int64) -> value: string. For example, a web Bigtable could be keyed by URL with columns like language or content. Each entry in this Bigtable could have multiple timestamped versions.

Operations on a single Bigtable row are transactional, though Bigtable does not support multi-row transactions. The keys are also sorted lexicographically and a range of keys, known as a tablet, is the unit of replication and locality. This lets users colocate two keys by making the keys lexicographically close to one another.

Columns in Bigtable are grouped into (a modest number of static) column families where each column family is a unit of access control. Bigtables can be scanned by row and by column.

Timestamps are 64 bit integers and can either be assigned by Bigtable in an increasing way or assigned by the user program. Values can be garbage collected by keeping at most n objects at a time or by keeping all objects that are n seconds old.

Implementation

BigTable consists of a single master, a bunch of tablet servers, and a bunch of clients. The master is responsible for system administration. It assigns tablets to tablet servers, oversees the addition and removal of tablet servers, performs garbage collection, orchestrates load balancing, and manages schema changes. Tablet servers host 10-1000 tables and are responsible for servicing reads and writes. They also split tablets that have grown too large.

Refinements

Lessons