The course schedule is tentative and subject to change.
Week |
Topic |
Wednesday |
Friday |
Week 1
| Intro & Logistics
| Aug 29
Lec 0: Intro: Distributed systems & cloud computing research overview
Readings:
How (and How Not) to Write a Good Systems Paper,
How to Read an Engineering Research Paper
Assignment 0: student info & signing up for presentations
|
Aug 31
|
Week 2
| Cloud computing & data consistency
| Sep 5
Lec 1: Cloud computing and data consistency
Readings: Above the clouds,
Eventually consistent (link to Werner Vogels'
original weblog post)
Assignment 1: Build a consistent cloud object store on top of weakly consistent AWS S3
|
Sep 7
|
Week 3
| Distributed storage impl & consensus algo
| Sep 12
Lec 2: Distributed storage implementation & distributed consensus algorithms
Readings: BespoKV [IEEE SC'18],
Consensus Protocols: 2PC (The paper trail),
Consensus Protocols: 3PC (The paper trail),
Paxos made simple
Optional readings: Paxos made live,
Simple explanations of Paxos (Quora post),
Chain replication [USENIX OSDI'04],
CRAQ [USENIX ATC'09]
|
Sep 14
Assignment 1 due (deadline extended to 11:59pm Sep 16)
|
Week 4
| All things distributed
| Sep 19
Lec 3: Basic Paxos
Paper 1: In Search of an Understandable Consensus Algorithm
[USENIX ATC 2014] Slides
Paper 2: ZooKeeper: Wait-free coordination for Internet-scale systems
[USENIX ATC 2010] Slides
Optional readings: Chubby,
PaxosStore,
NOPaxos,
TAPIR
Assignment 2: Optimizing consistent S3KV with hashing, caching, and GC
Assignment 3: Pick your project
|
Sep 21
|
Week 5
| Not virtualize, containerize [I]
| Sep 26
Lec 4: Containers and serverless computing
Paper 3: Slacker: Fast Distribution with Lazy Docker Containers
[USENIX FAST 2016] Slides
Paper 4: SAND: Towards High-Performance Serverless Computing
[USENIX ATC 2018] Slides
Paper 5: My VM is lighter (safer) than your container
[ACM SOSP 2017] Slides
Optional readings: Cntr,
Serverless platform analysis,
OpenLambda,
Docker Registry workload analysis
|
Sep 28
PICK YOUR PROJ Due
|
Week 6
| Big big data
| Oct 3
Lec 5: GFS+MapReduce primer + project plan discussion
Paper 6: Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
[USENIX NSDI 2012]
Paper 7: Bigtable: A Distributed Storage Systems for Structured Data
[USENIX OSDI 2006] Slides
Optional readings: MapReduce,
Jeff Dean's talk about Google's MapReduce, BigTable, Spanner, and so on,
Hadoop YARN (Hadoop v2),
HDFS
|
Oct 5
Assignment 2 due
|
Week 7
| Accelerating learning [I]
| Oct 10
Lec 6: Applied machine learning at Facebook
Paper 8: TensorFlow: A System for Large-Scale Machine Learning
[USENIX OSDI 2016] Slides
Paper 9: Scaling Distributed Machine Learning with the Parameter Server
[USENIX OSDI 2014] Slides
Optional readings: Petuum (PMLS),
Applied machine learning at Facebook: A datacenter infrastructure perspective
|
Oct 12
|
Week 8
| Project milestone I
| Oct 17
Project proposal presentation
|
Oct 19
|
Week 9
| Hack day
| Oct 24
No class (tentative)
|
Oct 26
Project proposal due
|
Week 10
| Not virtualize, containerize [II]
| Oct 31
Lec 7: Container registry and IBM registry workload analysis
Paper 10: SCONE: Secure Linux Containers with Intel SGX
[USENIX OSDI 2016] Slides
Paper 11: SOCK: Rapid Task Provisioning with Serverless-Optimized Containers
[USENIX ATC 2018] Slides
Optional readings:
Docker Registry workload analysis
|
Nov 2
|
Week 11
| Accelerating learning [II]
| Nov 7
Lec 8: Distributed machine learning w/ Dask
Paper 12: Ray: A Distributed Framework for Emerging AI Applications
[USENIX OSDI 2018]
Paper 13: Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds
[USENIX NSDI 2017]
Optional readings:
MXNet,
Clipper
|
Nov 9
|
Week 12
| Resource disaggregation & rack-scale computing
| Nov 14
No class (SC'18)
Reading: Network Requirements for Resource Disaggregation
[USENIX OSDI 2016]
Reading: Efficient Memory Disaggregation with Infiniswap
[USENIX NSDI 2017]
Optional readings: RackOut, ReFlex, Pelican, Flash storage disaggregation
|
Nov 16
|
Week 13
| Thanksgiving Week No class
|
Nov 21
Project checkpoint report due at 00:00am
|
Nov 23 Happy Thanksgiving! |
Week 14
| Managing each bit of datacenters
| Nov 28
Lec 9: Managing Distributed In-memory Caching Cluster in Datacenters
Paper 14: Borg: Large-scale Cluster Management at Google with Borg
[ACM EuroSys 2015]
Paper 15: Performance Isolation and Fairness for Multi-Tenant Cloud Storage
[USENIX OSDI 2012]
Optional readings: SwitchKV,
Spore
|
Nov 30
|
Week 15
| Miscellaneous
| Dec 5
Lec 10: Analyzing Alibaba's Datacenter Workloads
Paper reading: Analyzing Alibaba's Co-located Datacenter Workloads
[IEEE BigData 2018]
Paper 16: Fast Crash Recovery in RAMCloud
[ACM SOSP 2013]
Optional readings: Omega,
Sparrow,
Quasar,
Melea,
Google datacenter workload analysis
|
Dec 7
|
Week 16
| Project milestone II
| Dec 12
No class ========================>
|
Dec 14
Final project presentation:
4:30-7:30pm, Rm 4201 Eng Building
|
Week 17
| Project milestone III
| Dec 19
Final project report & src due
|
Dec 21
|