Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing

Like MapReduce and Dryad but with caching
Like Pregel but more general purpose
Bulk data processing, unlike shared memory
Lineage for fast recovery
Narrow and wide dependencies
Schedule jobs with as many narrow deps as possible
Store intermediate data on mappers (like MR)
Re-run stuff when things fail
Interpreter magic
Evict partitions on an RDD level LRU; unless LRU is same RDD