Reading List¶
Distributed Systems Foundation¶
- In Search of an Understandable Consensus Algorithm [USENIX ATC 2014]
- The Chubby Lock Service for Loosely-Coupled Distributed Systems [USENIX OSDI 2006]
- ZooKeeper: Wait-free coordination for Internet-scale systems [USENIX ATC 2010]
Cloud Storage¶
- SPANStore: Cost-Effective Geo-Replicated Storage Spanning Multiple Cloud Services [ACM SOSP 2013]
- Performance Isolation and Fairness for Multi-Tenant Cloud Storage [USENIX OSDI 2012]
- Consistency-Based Service Level Agreements for Cloud Storage [ACM SOSP 2013]
Container-based Virtualization¶
- My VM is lighter (safer) than your container [ACM SOSP 2017]
- SCONE: Secure Linux Containers with Intel SGX [USENIX OSDI 2016]
- Slacker: Fast Distribution with Lazy Docker Containers [USENIX FAST 2016]
Serverless Computing¶
- SAND: Towards High-Performance Serverless Computing [USENIX ATC 2018]
- SOCK: Rapid Task Provisioning with Serverless-Optimized Containers [USENIX ATC 2018]
- Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads [USENIX NSDI 2017]
Distributed Machine/Deep Learning¶
- Ray: A Distributed Framework for Emerging AI Applications [USENIX OSDI 2018]
- TensorFlow: A System for Large-Scale Machine Learning [USENIX OSDI 2016]
- Scaling Distributed Machine Learning with the Parameter Server [USENIX OSDI 2014]
- Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds [USENIX NSDI 2017]
Big Data Systems¶
- Bigtable: A Distributed Storage Systems for Structured Data [USENIX OSDI 2006]
- Dynamo: Amazon’s Highly Available Key-value Store [ACM SOSP 2007]
- Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing [USENIX NSDI 2012]
- Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling [ACM EuroSys 2010]
Memory-driven Computing¶
- Fast Crash Recovery in RAMCloud [ACM SOSP 2013]
- Scaling Memcache at Facebook [USENIX NSDI 2013]
- Latency-Tolerant Software Distributed Shared Memory [USENIX ATC 2015]
- NetCache: Balancing Key-Value Stores with Fast In-Network Caching [ACM SOSP 2017]
- EC-Cache: Load-Balanced, Low-Latency Cluster Caching with Online Erasure Coding [USENIX OSDI 2016]
Cluster Management Systems¶
- Borg: Large-scale Cluster Management at Google with Borg [ACM EuroSys 2015]
- Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center [USENIX NSDI 2011]
- Omega: flexible, scalable schedulers for large compute clusters [ACM EuroSys 2013]
- Sparrow: Distributed, Low Latency Scheduling [ACM SOSP 2013]
Resource Disaggregation¶
- Network Requirements for Resource Disaggregation [USENIX OSDI 2016]
- Efficient Memory Disaggregation with InfInIswap [USENIX NSDI 2017]