Near instantaneous high availability for databases?

In the study of the availability of computer systems in general, three distributed concepts apply.

a) Replication makes multiple (two or more) copies of an application and its data available. If any one copy becomes unavailable due to a failure, the remaining copies remain available. A transaction will typically fail, and upon retrying it will succeed on a surviving copy. This is the most elegant way of doing it and provides uptime as close to 100% as you’ll ever get.

b) Federation presents multiple (usually two) copies of the same application and its data as a single one. A failure in one of the copies does not affect the application itself. This approach works well for legacy systems or those which have limited error handling. Storage mirroring with EMC vPlex or IBM SVC or system replication with SAP HANA works this way. It is really elegant. Uptime is very high.

c) Relocation is the act of referring to a second, inactive copy of an application and its data in the event of a failure in the first. In this scenario there is always only a single active copy. Similar to replication, a transaction will typically fail, and upon retrying it will succeed on a newly activated copy. This is the least of the three. Database clustering, replication and log shipping all work this way and there’s a reason for that which I’ll get to in a moment. Uptime suffers due do service restarts, uncommitted transactions rolling back, committed transactions rolling forward and more.

It is clear that the three concepts above describe a certain dimension of a distributed system with multiple nodes, where processing and system state is replicated, federated or relocated across the nodes.

Let us now classify servers into the following categories:
a) Hypervisors do the obvious thing and abstract one or more physical servers into a platform that hosts one ore more virtual servers. VMWare ESX, Microsoft Hyper-V, KVM, the list is quite long nowadays.
b) Stateful servers capture state, that is, they record (or persist), into a database or filesystem, the current state of the system under consideration. Database servers and file servers are typical examples.
c) Stateless servers do not contain any state that needs to persist beyond the current transaction. Web and application servers are typical stateless servers. Another good example is Microsoft’s Biztalk – the Biztalk servers each have a local database, but the state recorded in this database is only important for the duration of the transaction. Thus, for the purpose of this classification, such a server is considered stateless.

Consider now a stateful (database) server. Data must be consistent, that is, if data is written, a subsequent read should render the same data.

The CAP theorem, also known as Brewers theorem, states that is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:
• consistency (all nodes see the same data at the same time),
• availability (a request to a business service is guaranteed to receive a response), and
• partition tolerance (the system as a whole continues to operate despite a failure of a part of it).

Brewer conjectured that a distributed system can provide at most two of the three guarantees at any one point in time. We will therefor always have one of the following situations:
• Data is consistent and always available but we cannot tolerate local failures. This is effectively the same as having a single copy only.
• Data is consistent and in the event of a local failure it will not be available. In this case we have a distributed system which becomes unavailable in the event of a network failure.
• Data is always available, even in the event of local failures. However, it may not necessarily be consistent.

To achieve near instantaneous high availability for the database server component by replication, we require availability and partition tolerance. It is evident that consistency will then be sacrificed.

Our only options then are federation – which I don’t see working really well for databases; or relocation – exactly what we’ve been doing for long with what we usually call database replication.