Maximum Availability Architecture

CAP: Consistency and Availability except when Partitioned - Part 2

The previous post presented the CAP theorem as C and A except when P. For this formulation to be useful, partitions (or failures) must be uncommon and/or fast to recover from, i.e., the system must have liveness. Informally, livenessis a system's ability to eventually make progress, or be up most of the time ("something good eventually happens"). A good/useful system also has safety: a system's guarantee of correct behavior ("nothing bad happens"), e.g., to return correct results, maintain consistency, etc. This post explores the tradeoffs available, under the CAP theorem, to systems that meet both liveness and safety requirements; doing so economically is the main technical challenge of infrastructure-grade systems, including databases.

Returning to the CAP theorem: rather than choose either consistency or availability, a good system can strive to maintain both in degrees, and/or suspend one or the other (perhaps on a per-operation basis) during some types of partitions or failures. A system may impose some restrictions on what operations it allows during a partition, to maintain availability and also the ability to restore consistency once it recovers (from a partition/failure).

For example, a collaboration platform may restrict some operations when a user is updating a shared document locally, while disconnected from the document server (in effect, the system is partitioned with respect to that user's data). Restricting operations during a partition makes it easier to reconcile updates once the partition ends. In this instance, weakening availability (some operations are unavailable) enables restoring consistency (reconciling concurrent updates from disconnected users) as part of recovering from a failure/partition. As another example, where absolute strong consistency is required, as in an Oracle database, a partition may indeed result in an operational mode where only read-only operations are allowed. Google's Spanner is a good case study of CAP tradeoffs in a mission-critical globally distributed system.

In a subsequent post we will examine consistency and availability tradeoffs in the Oracle database ecosystem.