Course ScheduleΒΆ

The course schedule is tentative and subject to change.

Week Topic Wednesday Friday
Week 1
Intro & Logistics
Aug 29
Lec 0: Intro: Distributed systems & cloud computing research overview
Readings: How (and How Not) to Write a Good Systems Paper, How to Read an Engineering Research Paper
Assignment 0: student info & signing up for presentations
Aug 31
Week 2
Cloud computing & data consistency
Sep 5
Lec 1: Cloud computing and data consistency
Readings: Above the clouds, Eventually consistent (link to Werner Vogels' original weblog post)
Assignment 1: Build a consistent cloud object store on top of weakly consistent AWS S3
Sep 7
Week 3
Distributed storage impl & consensus algo
Sep 12
Lec 2: Distributed storage implementation & distributed consensus algorithms
Readings: BespoKV [IEEE SC'18], Consensus Protocols: 2PC (The paper trail), Consensus Protocols: 3PC (The paper trail), Paxos made simple
Optional readings: Paxos made live, Simple explanations of Paxos (Quora post), Chain replication [USENIX OSDI'04], CRAQ [USENIX ATC'09]
Sep 14
Assignment 1 due (deadline extended to 11:59pm Sep 16)
Week 4
All things distributed
Sep 19
Lec 3: Basic Paxos
Paper 1: In Search of an Understandable Consensus Algorithm [USENIX ATC 2014] Slides
Paper 2: ZooKeeper: Wait-free coordination for Internet-scale systems [USENIX ATC 2010] Slides
Optional readings: Chubby, PaxosStore, NOPaxos, TAPIR
Assignment 2: Optimizing consistent S3KV with hashing, caching, and GC
Assignment 3: Pick your project
Sep 21
Week 5
Not virtualize, containerize [I]
Sep 26
Lec 4: Containers and serverless computing
Paper 3: Slacker: Fast Distribution with Lazy Docker Containers [USENIX FAST 2016] Slides
Paper 4: SAND: Towards High-Performance Serverless Computing [USENIX ATC 2018] Slides
Paper 5: My VM is lighter (safer) than your container [ACM SOSP 2017] Slides
Optional readings: Cntr, Serverless platform analysis, OpenLambda, Docker Registry workload analysis
Sep 28
PICK YOUR PROJ Due
Week 6
Big big data
Oct 3
Lec 5: GFS+MapReduce primer + project plan discussion
Paper 6: Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing [USENIX NSDI 2012]
Paper 7: Bigtable: A Distributed Storage Systems for Structured Data [USENIX OSDI 2006] Slides
Optional readings: MapReduce, Jeff Dean's talk about Google's MapReduce, BigTable, Spanner, and so on, Hadoop YARN (Hadoop v2), HDFS
Oct 5
Assignment 2 due
Week 7
Accelerating learning [I]
Oct 10
Lec 6: Applied machine learning at Facebook
Paper 8: TensorFlow: A System for Large-Scale Machine Learning [USENIX OSDI 2016] Slides
Paper 9: Scaling Distributed Machine Learning with the Parameter Server [USENIX OSDI 2014] Slides
Optional readings: Petuum (PMLS), Applied machine learning at Facebook: A datacenter infrastructure perspective
Oct 12
Week 8
Project milestone I
Oct 17
Project proposal presentation
Oct 19
Week 9
Hack day
Oct 24
No class (tentative)
Oct 26
Project proposal due
Week 10
Not virtualize, containerize [II]
Oct 31
Lec 7: Container registry and IBM registry workload analysis
Paper 10: SCONE: Secure Linux Containers with Intel SGX [USENIX OSDI 2016] Slides
Paper 11: SOCK: Rapid Task Provisioning with Serverless-Optimized Containers [USENIX ATC 2018] Slides
Optional readings: Docker Registry workload analysis
Nov 2
Week 11
Accelerating learning [II]
Nov 7
Lec 8: Distributed machine learning w/ Dask
Paper 12: Ray: A Distributed Framework for Emerging AI Applications [USENIX OSDI 2018]
Paper 13: Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds [USENIX NSDI 2017]
Optional readings: MXNet, Clipper
Nov 9
Week 12
Resource disaggregation & rack-scale computing
Nov 14
No class (SC'18)
Reading: Network Requirements for Resource Disaggregation [USENIX OSDI 2016]
Reading: Efficient Memory Disaggregation with Infiniswap [USENIX NSDI 2017]
Optional readings: RackOut, ReFlex, Pelican, Flash storage disaggregation
Nov 16
Week 13
Thanksgiving Week
No class
Nov 21
Project checkpoint report due at 00:00am
Nov 23
Happy Thanksgiving!
Week 14
Managing each bit of datacenters
Nov 28
Lec 9: Managing Distributed In-memory Caching Cluster in Datacenters
Paper 14: Borg: Large-scale Cluster Management at Google with Borg [ACM EuroSys 2015]
Paper 15: Performance Isolation and Fairness for Multi-Tenant Cloud Storage [USENIX OSDI 2012]
Optional readings: SwitchKV, Spore
Nov 30
Week 15
Miscellaneous
Dec 5
Lec 10: Analyzing Alibaba's Datacenter Workloads
Paper reading: Analyzing Alibaba's Co-located Datacenter Workloads [IEEE BigData 2018]
Paper 16: Fast Crash Recovery in RAMCloud [ACM SOSP 2013]
Optional readings: Omega, Sparrow, Quasar, Melea, Google datacenter workload analysis
Dec 7
Week 16
Project milestone II
Dec 12
No class ========================>
Dec 14
Final project presentation:
4:30-7:30pm, Rm 4201 Eng Building
Week 17
Project milestone III
Dec 19
Final project report & src due
Dec 21