Introduction: Scalable Services and Their State

We start by discussing the motivation for incrementally scalable
systems, the structure of cloud services and Internet-scale
services, and some of the challenges that they face. We introduce
some basic topics in managing data for these services: recovery and
consistency.

State Partitioning and Coordination

This unit addresses two closely related needs for scalable services:
distribute state and functions across the participating
servers, and coordinate control and ownership of data in a way that is
resistant to failures. The focus is on basic techniques that appear
in many different systems: leased locks, hashed key/value stores,
and consistent hashing. We also discuss relationships to data consistency and recovery.

Name Services and DNS

This unit is a first look at large-scale multi-domain services, using
the Domain Name Service as an example. New issues: data integrity
and security of ownership, location, and control. We discuss the
current DNS implementation, security extensions (DNSSEC), and
an alternative implementation based on a distributed key-value
store (DHT). Model-driven caching (with Zipf
distributions) is also useful for Web caching and other areas as well,
but we won't be too concerned with the models themselves.

A Wide-area File Store on a DHT

Pond is a prototype of an ambitious architecture for a wide-area file store with a configurable
consistency model, called OceanStore.
Like CoDoNS it is based on a distributed hash table (a DHT system
called Tapestry) with replication, and it uses hashing
and signing with public key cryptography.
Pond also gives a first look at Byzantine fault tolerance (BFT), primary-copy
replication, erasure codes for redundancy, version trees and
snapshot history ("time travel"), and Merkle trees.
The important points to focus on are how Oceanstore
maps a file abstraction with configurable consistency and snapshots
onto the underlying key-value store, and keeps it robust and secure
across a wide range of failures of its components.

Scalable File Systems

These systems illustrate a good continuum of redundancy schemes and various degrees of
metadata decentralization. Like Pond (and Centrifuge and CoDoNs) they
use primary write protocols (e.g., primary/backup replication).

Consensus

We discuss the Paxos algorithm in depth, touch on Byzantine
consensus and Byzantine Fault Tolerance (BFT), and
discuss the Fox/Brewer "CAP Theorem" and how it relates
to the various systems we studied.