Analysis and Evolution of Journaling File Systems (2005)

Summary. The authors develop and apply two file system analysis techniques dubbed Semantic Block-Level Analysis (SBA) and Semantice Trace Playback (STP) to four journaled file systems: ext3, ReiserFS, JFS, and NTFS.

Users install an SBA driver into the OS and mount the file system of interest on to the SBA driver. The interposed driver intercepts and logs all block-level requests and responses to and from the disk. Moreover, the SBA driver is specialized to each file system under consideration so that it can interpret each block operation, categorizing it as a read/write to a journal block or regular data block. Implementing an SBA driver is easy to do, guarantees that no operation goes unlogged, and has low overhead. - Deciding the effectiveness of new file system policies is onerous. For example, to evaluate a new journaling scheme, you would traditionally have to implement the new scheme and evaluate it on a set of benchmarks. If it performs well, you keep the changes; otherwise, you throw them away. STP uses block traces to perform a light-weight simulation to analyze new file system policies without implementation overhead.

STP is a user-level process that reads in block traces produced by SBA and file system operation logs and issues direct I/O requests to the disk. It can then be used to evaluate small, simple modifications to existing file systems. For example, it can be used to evaluate the effects of moving the journal from the beginning of the file system to the middle of the file system.

The authors spend the majority of the paper examining ext3: the third extended file system. ext3 introduces journaling to ext2, and ext2 resembles the Unix FFS with partitions divided into groups each of which contains bitmaps, inodes, and regular data. ext3 comes with three journaling modes:

  1. Using writeback journaling, metadata is journaled and data is asynchronously written to disk. This has the weakest consistency guarantees.
  2. Using ordered journaling, data is written to disk before its associated metatada is journaled.
  3. Using data journaling, both data and metadata is journaled before being checkpointed: copied from the journal to the disk.

Moreover, operations are grouped into compound transactions and issued in batch. ext3 SBA analysis led to the following conclusions:

SPT was also used to analyze the effects of

SBA and STP was also applied to ReiserFS, JFS, and NTFS.