Problem: C as in ACID is no more

A MySQL replication cluster is eventual consistent. All writes are to be send to the master. A write request is considered successful once the master has performed it.

MySQL replication cluster

Master

Slave

Slave

id = 1

id = NULL

id = 2

|

set(id = 1)

Client 1

Then, the master sends the update to the slave. Slave updates are asynchronous. For a short period, until all slaves have replicated the update, there is read inconsistency. The inconsistency window even exists with semisynchronous replication. Semisynchronous guarantees that the update information is stored on at least one slave but there is no promise its already available at the SQL layer.

MySQL replication cluster

Master

Slave

Slave

id = 1

id = NULL

id = 2

|

get(id)

Client 2

Regarding the C in ACID… Transactions still exist. But as you can see, the C is no more when you take a step back to look at the cluster as a whole. Thus, application developers must plan for inconsistency.

Analysis: consistency levels

Any of the three consistency levels listed can be achieved with MySQL replication. Although, I must confess, strong consistency means using the master for all requests. That’s not what elastic-fantastic is about…

Eventual consistency
Node may not have data, node may service stale version, node may serve current version. If no new updates happen, after the inconsistency windows has passed, all nodes will eventually return the same current data. Default with MySQL replication and PECL/mysqlnd_ms 1.1.

Session consistency
Read-your-writes. One client is guaranteed to see its updates for the duration of a session. PECL/mysqlnd_ms 1.1 has master_on_write, which is a tiny step into this direction.

Strong consistency
All clients get a consistent view after a successful update.

Mistake: this is left to the developer

PECL/mysqlnd_ms 1.1 leaves it to the user to implement a choice, a certain service level. Developer have to use the SQL hints MYSQLND_MS_SLAVE_SWITCH, MYSQLND_MS_LAST_USED_SWITCH, MYSQLND_MS_MASTER_SWITCH …

Wrong, stop it, thinking from the past and for 1.very_low releases! As a cloud user, I do not want to have to bother about details of the MySQL cluster! As a cloud users, I want a ready-to-use service. As a cloud user, I want to define the service level I need and see the database cluster deliver – within its capabilities. "Cloud" is used as a synonym for our times here.

Proposing service levels

As a cloud user, I want to set the service level like this. I don’t care much how its implemented as long I don’t have to fine-tune the setup.

If you compare this list with that of other players in the market, you’ll find amazing similarities…

Implementation considerations

The introduction of service levels into PECL/mysqlnd_ms – or at any other place on the client – will decrease performance. A client cannot ask a MySQL replication cluster for a list of slaves that lag no more than max_age seconds or have replicated a session id (or global transaction id) already with only one request. There is no central instance which can answer replication cluster node status requests.

Instead, a client has to contact the cluster nodes and check their status. Imagine 100 concurrent PHP clients of which each contacts all replication nodes to check status and forget their status at the end of the web request (remember – PHP default lifetime = web request). Hmm, that is 100 multiplied by … too much. Too much additional traffic, too many messages, too much latency.

Something in-between a central server-side instance and all clients check themselves is needed. Towards the central instance side dedicated daemon processes, proxies, web server plugins or job queues come to mind. All those could poll node status information either periodically or on-demand. Status information could be shared between many web requests (= PHP and PECL/mysqlnd_ms runs). But, who of us wants an extra piece in the stack? Sure, if your PHP MySQL server farm is huge, you already have a monitoring deamon running or, you are after SaaS/PaaS, it seems an acceptable option.

Or, data collection is not triggered externally and nodes report their status. This could be done with a MySQL server plugin. But then, where to send the information?

Joe-Dolittle could be happier if PECL/mysqlnd_ms copies the PHP session module approach to garbage collection. Garbage collection can be probability based. Its performed – at the average – every n web requests. The query cache plugin has copied this strategy.

PECL/mysqlnd_ms could check node status every n web requests and cache information somewhere. Somewhere could be process memory, shared memory, memcache – whatever. This could and should help to reduce the number of node status requests from clients.

… just try it?!

I think, we should add the idea of a service level to PECL/mysqlnd_ms regardless of potential performance issues. In the first step, we could always fall back to choosing the master. The master can fullfill all service levels. If nothing else, an API to set a service level makes our API better and cleaner. It is a reasonable step towards hiding cluster details from the user – just as it should be in modern times…

As we go, we can try to reduce the cases in which the master is queried, if need be.

Eventual consistency combined with a maximum age is a relatively soft service level. If, for example, max_age=2 and the system does 1000 req/s, an update of the cached slave lag list every 500 requests seems reasonable.

Choosing slaves for the most basic variant of session consistency (session bound to one web request, no user-supplied session id) could be doable by checking for global transactions ids. When it comes to user-supplied session ids, things are likely to get nasty and slow.