A History and Evaluation of System R
- tl;dr
- System R is the granddaddy of all relational databases, introducing wildly influential ideas such as query optimization, compiled queries, views, degrees of isolation, multi granularity locking, etc.
- Overview
- The development of System R was divided into three phases: Phase 0 (an experimental period where the SQL language was honed), Phase 1 (where System R was actually built) and Phase 2 (where System R was evaluated).
- Phase Zero: An Initial Prototype
- Query Language: SQL with subqueries but no joins. No language embedding.
- Query Optimization: The query optimizer tried to minimize the number of tuples read, but should have minimized the number of I/Os and the CPU cost.
- Access Methods and Physical Storage: Tuples stored pointers into columnar domains. Inversions were like indexes from these domains to tuple ids. This meant reading a tuple took a couple of IOs.
- Concurrency Control: No concurrency.
- Recovery: No recovery.
- Misc: The catalog was stored as relations.
- Lessons learned: Joins are important, and the query optimizer should optimize for the common case.
- Phase One: Construction of a Multiuser Prototype
- Query Language: SQL with joins.
- Query Optimization:
- The Selinger query optimizer minimized a weighted combination of IOs and RSS calls (as a proxy for CPU cost).
- Queries were compiled into pre-written machine code fragments. A preprocessor would compile queries written in source code. If a query plan became invalidated (e.g. an index it depended on was dropped, the query was recompiled).
- Access Methods and Physical Storage:
- Tuples were stored row-by-row with B-tree indexes.
- Query plans had nested loop joins, index joins, and sort merge joins.
- Concurrency Control:
- Originally, they tried predicate locks, but it was a bit hard to pull off.
- Ended with multigranularity locks with intention locks.
- Offered RU, RC, RR (maybe serializable) and isolation.
- Recovery:
- Used shadow paging and a bit of logging for recovery.
- Misc:
- Views were used for authorization. Either a centralized DB admin could delegate rights, or the right delegation was completely decentralized.
- Phase Two: Evaluation
- Added SQL construct like EXISTS, LIKE, prepared statements, and outer joins.
- Users wanted to add groups to view authorization.
- Shadow paging really hurts data locality.
- Convoys happen in which a thread with a lock is blocked for a while by the OS.