Link

MySQL 5.7 clustering: The developer perspective

(Compiled from revised slides of previous presentations - skip if you know the old presentations)
…

(Compiled from revised slides of previous presentations - skip if you know the old presentations)

A summary on clustering MySQL 5.7 with focus on the PHP clients view and the PHP driver. Which kinds on MySQL clusters are there, what are their goal, how does wich one scale, what extra work does which clustering technique put at the client and finally, how the PHP driver (PECL/mysqlnd_ms) helps you.

Transcript

2.
The speaker says...
MySQL aims to be the web database. Web databases must
scale easily. Using a distributed database and clustering is a
must.
In this talk, we'll explore the clients view on the task at the
example of PHP:
• Which cluster options exist?
• What tasks are there from an developer
perspective?
• Which tier should handle the additional tasks
• How the PHP driver helps you

5.
The speaker says...
A distributed database cluster strives for maximum
availability and scalability while maintaining distribution
transparency. MySQL Cluster has a shared-nothing design
good enough for 99,999% (five minutes downtime per year).
It scales from Rasperry Pi run in a briefcase to 1.2 billion
write transactions per second on a 30 data nodes cluster (if
using possibly unsupported low level APIs.) It offers full
distribution transparency with the exception of partition
relocation to be triggered manually but performed
transparently by the cluster. True?
On the other end of the distribution transparency scale is
MySQL Replication. Why two products? What kind of
clusters exist?

7.
The speaker says...
A wide range of clusters can be categorized by asking
where transactions are run and when replicas
synchronize their data. Any eager solution ensures that all
replicas are synchronized at any time: it offers strong
consistency. A transaction cannot commit before
synchronization is done. Please note, what it means to
transaction rates:
• Single computer tx rate ~ disk/fsync rate
• Lazy cluster tx rate ~ disk/fsync rate
• Eager cluster tx rate ~ network latency (RTT)
Test: Would you deploy a synchronous (eager) cluster on a
WAN, or prefer using an asynchronous (lazy) solution?

9.
The speaker says...
MySQL Replication falls into the category of lazy Primary
Copy clusters. It is a rather unflexible solution as all
updates must be sent to the primary. However, this
simplifies concurrency control of conflicting, concurrent
update transactions. Concurrency control is no different from
a single database. Lazy replication can be fast.
Transactions don't have to wait for synchronization of
replicas. The price of the fast execution is the risk of stale
reads and eventual consistency. Transactions can be
lost when the primary crashes after commit and before any
copy has been updated. (Workaround: MySQL semi-sync
replication, which delays the commit until delivery to copy.
Alternatively, use shared disk and standby.)

11.
The speaker says...
Primary copy is best suited for read dominated
workloads, as commonly found in web applications. The
more secondaries there are, the more reads can be served
concurrently. Adding secondaries to the setup does not add
(significant) load to the primary. Write scale out is not
possible. By definition there is only one server that accepts
update transactions. Eager Update Anywhere may offer
some, limited write scale capability.
MySQL Replication can also be used for:
• Backup – perfom blocking backup on slave
• High Availablility – for example, remote slaves
• Warehousing – OLTP reporting on the slaves

13.
The speaker says...
An eager update anywhere cluster improves
distribution transparency and removes the risk of
reading stale data. Transparency and flexibility is improved
because any transaction can be directed to any
replica. Synchronization happens as part of the commit,
thus strong consistency is achieved. Remember:
transaction rate ~ network RTT. Failure tolerance is
better than with Primary Copy. There is no single point of
failure – the primary - that can cause a total outage of the
cluster. Nodes may fail without bringing the cluster down
immediately. Concurrency control (synchronization)
is complex as concurrent transactions from different
replicas may conflict.

15.
The speaker says...
Read transactions can execute locally in an eager update
anywhere cluster. Adding replicas to the cluster, increases
the clusters read throughput.
Writes scale to some degree only. Update
transactions can execute in parallel on multiple replicas and
profit from the hardware capacity every additional replica
adds to the cluster a a whole. But update transactions must
be certified (test for conflicts) before commit and serialized
(1SR) at latest at commit on all replicas. Every additional
replica, every additional write adds load to all of the current
replicas. (Using a quorum instead of ROWA only shifts the
issue.)

17.
The speaker says...
Eager (synchronous) systems are subject to a hard limit:
transactions are synchronized over network cables. Network
cables are longer than HDD/Flash/RAM cables. Given the
limited speed of light, ultimately, messaging on the network
will always be slower than messaging on a single machine.
However, to scale writes there is only one solution: divide-
and-conquery, partial replication. The number of writes
processed by a replica must be reduced and replicas must
not wait for (many) other replicas before commit. Horizontal
partitioning (sharding) can be used. The downside is extra
network trafic for reads that involve multiple replicas.

19.
The speaker says...
MySQL has two products that use the idea of partial
replication to scale writes. The products have different
designs for different use cases. MySQL Cluster is optimized
for fast, reliable network connections, short transactions and
real time responses. It prefers main memory over disk
storage but also supports on-disk storage.
From a SQL clients point of view, MySQL Cluster is nothing
but a different storage engine for SQL tables created in a
MySQL database. However, internally, it is an auto-sharding
system. Guess what uneducated users did and got
disappointed about in the past: JOINs … (MySQL Cluster
7.3 has much improved JOIN performance.)

21.
The speaker says...
MySQL Fabric is daemon for managing farms of
MySQL servers. Farms consist of „groups“. A group either
consists of any number of individual MySQL servers, or
holds a MySQL Replication cluster. A group describing a
replication cluster consists of a master and any number of
slaves, as ever since. MySQL Fabric can setup,
administrate and monitor groups. Once a MySQL
Server has been installed, Fabric can take care of the
replication setup details. Failure detectors (built-in: basic
heartbeating) can be used for automatic failover of
primaries. Failover is based on MySQL 5.6+ GTID logic, use
MySQL 5.6.10 or newer. More
http://www.slideshare.net/nixnutz/mysql-57-fabric-high-

23.
The speaker says...
MySQL Fabric also uses partial replication/partitioning to
scale both read and writes. It is build on top of MySQL
Replication (lazy primary copy) which tolerates slow and
unreliable network connections.
Fabric supports range, hash or list based partitioning of
tables using one column as a shard key. Each partition is
assigned to a logical shard group, short: shard.
Sharding is not transparent to SQL users. A sharded table
exists on multiple replicas. Each replica holds a subset of
the data. Clients must query all replicas they need data from.

25.
The speaker says...
For reasons of brevity, the theoretical discussion of MySQL
clustering options stops there. A rough catalog has been
proposed under „Goals of distributed systems“ to further
investigate the various options. Of course, there are
interesting 3rd party options to discover – particularily for the
DBA.
However, to get a developer started with MySQL clustering
this should have been more than enough background
information.

28.
The speaker says...
All PHP applications talking to a cluster of MySQL
database servers are facing some common tasks.
Replacing a single database with a database cluster means
changing the 1:1 relation between the application and the
database into a 1:n relation. 1:n puts an additional task on
the application: find the right n, find the right server
to talk to and so forth.

30.
The speaker says...
Further tasks may exist depending on the kind of cluster
choosen. Generally speaking an eager update anywhere
system is the easiest to use. However, it may not always fit
the bill.
Then lazy replication has to be used or even partial
replication, which complicates matters and adds extra tasks
to application developer.

32.
The speaker says...
All of the following is a slightly revised and updated version
of http://www.slideshare.net/nixnutz/load-mysq-clusterin-
balancing-peclmysqlndms-14
Skip, if you know that presentation.

34.
The speaker says...
In a synchronous cluster, for example, if using MySQL
Cluster, all nodes have all the data. Thus, every application
server could be assigned to one DBMS node. Easy, fast
and good for MySQL Cluster but with limits. No
good for asynchronous MySQL Replication.
Limit: DBMS node failure includes application node
failure. Clients should have additional failover logic.
Limit: Over- and undercapacity of a DBMS node
cannot be compensated. Clients cannot switch to more
powerful nodes. Overcapacity of a MySQL node cannot be
used by other clients. Different hardware size is ignored.

36.
The speaker says...
A transparent MySQL Proxy based solution
requires no application changes. Clients connect to
the Proxy using MySQL Client Server Protocol, as if the
MySQL Proxy was a MySQL Server.
Proxy can compensate for DBMS failures. It can
react to temporary and permant outages.
Proxy can adapt load balancing dynamically. Taking
into account node hardware size, current node load, latency,
location, … fantasy sets the limits here for self-written or 3rd
party Proxy scripts.

38.
The speaker says...
MySQL Proxy is a great performer!
But, … MySQL Proxy adds complexity to the stack.
MySQL Proxy needs to be managed. MySQL Proxy is build
around C and Lua. C and PHP would be a better match for
PHP users. Wrongly used, MySQL Proxy becomes a single
point of failure. It is single threaded (about to change). This
bares the risk of tasks (Proxy scripts) blocking each other.
But, … MySQL Proxy adds latency. Although, it can
be minimized significantly running MySQL Proxy on the App
Server to avoid use of TCP/IP between PHP and Proxy.

40.
The speaker says...
A client-side approach to the problem is promising,
if client applications can be changed. It has most
Pro's of the previous approaches.
The load balancer is part of the client application.
It scales by client and it fails by client. Scalability is
given and there is no single point of failure. No additional
latency occurs.
Load balancing can be adaptive for good resource
usage. DBMS node failures do not block clients,
fail over is possible.

42.
The speaker says...
The major downside of a client-side application
based solution is the need to change the
application. Application changes must be done even for
basic cluster use cases. Modifications to source code may
not be possible. They may complicate upgrades of standard
software. They may cause lots of work thus become
expensive.
Load balancing is part of PHP thus stateless. This is both a
Pro and a Con. It is difficult to hint other clients about node
failures. On the other hand, shared nothing is not a bad idea
either.

44.
The speaker says...
A driver based client-side solution has all Pro's!
Considering load balancing aspect only, no application
changes are required. If using MySQL Cluster, a
synchronous cluster, no application changes are
required. If using MySQL Replication, an
asynchronous cluster that needs read-write
splitting, the need for application changes is
significantly reduced! Please, welcome the free and
open source PECL/mysqlnd_ms driver plugin.
The PECL mysqlnd_ms installation can be part of the next
PHP deployment, done anyway for security considerations.
No new procedure. If any issues, simply disable extension.

47.
The speaker says...
In late 2006, MySQL decided to develop a client server
library written in C, tailored to the needs of PHP and
offered to be integrated into PHP under the terms
of the PHP license. The MySQL native driver for PHP
was born. Mysqlnd is an alternative to using the MySQL
Client Library with PHP. Both libraries are compiled into
PHP. They work beneath the user APIs and are written in C
like PHP. Very early statistics for monitoring and a
plugin API have been added. The C-level plugin API
allows plugins hook each and every C call inside the library.
This is a comfortable way to add a MySQL
proxy/interceptor to PHP without impacting PHP
applications. All inside the PHP interpiler.

49.
The speaker says...
PECL/mysqlnd_ms is a plugin for the MySQL
native driver for PHP (mysqlnd) library. The mysqlnd
library is part of PHP as of version 5.3. As of 5.4 mysqlnd is
a default.
All three PHP MySQL APIs (mysql, mysqli, PDO_MySQL)
can be compiled to use the mysqlnd library, thus all
existing PHP MySQL application are supported by
the plugin.
From an applications point of view a mysqlnd
plugin can act as a transparent proxy. Depending on

51.
The speaker says...
No worry, it is not complicated. All the magic is beneath the
hood, in the C code of PECL/mysqlnd_ms. The user
interface is short and comprehensive.
This is all what PECL/mysqlnd_ms is about: make using
any kind of MySQL cluster easy! Let the load balancer
do the work. If anything is missing for your favourite cluster,
let us know. The biggest gap currently is Fabric, which is
pre-production.
The free and open source PECL/mysqlnd_ms is not the only
way to get the job done. If you are happy with your current
solution, stick with it. Compare features, copy ideas...

55.
The speaker says...
Some cluster require read/write splitting. If so, clients must
direct all writes to the primary (master). The plugin can,
optionally, do automatic read/write splitting. Any automatic
decision of the plugin can be overruled. In this case using a
hint or a callback.
Detection of reads is driver and API dependent and works
differently among different MySQL drivers. The PHP driver
contains no full-fledged parser for SQL statement analysis.
The existing APIs do neither force nor allow developers to
set a connection read-only. If the application does not hint a
read only transaction, the plugin checks if a statement
begins with SELECT or a certain SQL hint. If not, its
considered a write – a safe default choice.

57.
The speaker says...
Statement redirection for load balancing bares a
pitfall: connection state. Session variables and
transactions are part of the state. A load balancer must not
switch servers in the middle of a transaction. When using
primary copy, all transactions shall go to the primary
(master) because it is unknown at the beginning of a
transaction whether it includes updates.
The MySQL Client Server protocol does not yet hint whether
a transaction is active. Load balancers must either parse
SQL to catch BEGIN, monitor API calls or be hinted by the
application. PECL/mysqlnd_ms monitors the API and
understands hints.

59.
The speaker says...
Use the API calls to control transactions. Give the
database driver a chance to help you. Virtually any state
change that you perform through API calls is
handled by PECL/mysqlnd_ms. For example, if you
change the charset with mysqli_set_charset(), the plugin
makes sure that all opened connections use the same
charset. We hide complexity, make cluster use easier.
Consider using master_on_write=1. This will keep all
reads followed by the first write on the master. Handy, if you
use asynchronous cluster and eventual consistency is not
enough. Wait for quality-of-service, its getting

61.
The speaker says...
The best and default load balancing stategy is the
one with the lowest impact on the connection
state: random once. Minimize switches for the life-span
of your PHP script. Pick a random slave during startup and
use it for all reads. The life-span of PHP is the of a web
request. It is short. Two web requests will end up on two
random slaves.
Nodes with a weight of two will get twice as many
requests as nodes with a weight of one. This is not
only great if your nodes have different computing
power but also to prefer local nodes over remote
ones by setting a higher weight. Fabric or similar
could drive this further to truly adaptive load balancing

63.
The speaker says...
Connection state changes are what speaks against
automatic, silent fail over once a connection has
been established and fails. If your application is written
in a way that automatic reconnect upon failure is allowed,
you can enable it. Otherwise, handle the error, do not
close the connection but run the next query to
trigger a new load balancing attempt! Failed nodes
can be remembered for the duration of a request.
Automatic failover can either try to connect to the
master or try to use another slave before failing
over to the master. If using MySQL Cluster, the search
logic still applies with the only difference that only masters
are to be configured.

65.
The speaker says...
By default, to reduce the connection pool size of
every PHP web request, connections are not
opened before executing a statement. Imagine you
have 50 concurrent web requests, you configured four
slaves and, you are using random once. If not using lazy
connections, the plugin opens 50 x 4 = 200 slave
connections but uses at max 50 of them (random once).
Every connection occupies resources – ever seen a PHP
application opening connections before using? There are
many badly written ones...
Many connection state changing API calls are buffered and
replayed when opening lazy to prevent pitfalls. Server
support is missing.

66.
Do not bother about node selection – we do
• mysqlnd_ms_set_qos()
• Eventual, session and strong consistency
• Allow only certain slave lag, enable client-side caching
Quality of service
Database cluster as a service
Client
Node NodeNode Node
PECL/mysqlnd_ms

67.
The speaker says...
The basics (load balancing, failover) are solved. Abstraction
can start. We can start to see the cluster as a service. The
load balancer knows how to use the cluster to achieve a
certain quality of service. The application formulates
its service quality requirements, the load balancer
takes appropriate action. For example, state the
consistency level needed. Is reading stale data from a
slave allowed (eventual consistency) and if so, is
there any limit on freshness?
Having API calls to know the client demands is a
great help for load balancer and cluster vendors!

69.
The speaker says...
An eventual consistent node may or may not serve
the latest copy. In fact, there is no promise that a
particular copy is available from every node. Many
systems that default to eventual consistency reach strong
consistency over time. Eventually – after some time - all
nodes get synchronized. This model is similar to that of a
cache. Eventual consistency is good enough for browsing
product catalogs or other infrequently changing contents in
areas where stale data is acceptable. It is the default
consistency level with MySQL Replication.
Is eventual consistency evil? Depends on requirements.
And, BTW, how do ATMs work – always connected to host?

73.
The speaker says...
A MySQL Replication slave is eventual consistent – it may or
may not have the latest updates.
Slaves can be filtered by replication lag:
mysqlnd_ms_set_qos($conn,
MYSQLND_MS_QOS_CONSISTENCY_EVENTUAL,
MYSQLND_MS_QOS_OPTION_AGE, 5) filters out all
slaves that report an estimated lag of more than five
seconds.
Note that a slave is a cache...

75.
The speaker says...
If the load balancer knows that eventual
consistency is good and a certain degree of stale
data (age, slave lag) is allowed, a slave access
can be transparently replaced with a cache
access. This lowers the overall slave load in the cluster
and reduces latency.
Replacing a slave access with a cache access is
transparent from an application point of view.
PECL/mysqlnd_ms is using PECL/mysqlnd_qc (qc = query
cache) for the caching. PECL/mysqlnd_qc can also be used
as a standalone cache to store query results in Memcache,

77.
The speaker says...
Complicated logic behind but easy to use – PHP
should look out for new cluster aware APIs.
Let's look beyond eventual consistency – even if using a lazy
primary copy system such as MySQL Replication.

79.
The speaker says...
A global transaction identifier is a cluster-wide
unique transaction identifier. MySQL 5.6 can generate
it automatically. MySQL Proxy and PECL/mysqlnd_ms 1.2
feature client-side emulations for use with any MySQL
version. SQL can be used to access the GTIDs.
GTIDs have been been created to make MySQL Replication
failover easier. However, they are useful for load balancing
as well in a primary copy system.

81.
The speaker says...
Global transaction identifier help to find „up-to-
date“ slaves that have already replicated the latest
updates of a client. Thus, session consistency can
now be achieved by reading from the master and
selected „up-to-date“ slaves. This works with the GTID
emulation of PECL/mysqlnd_ms 1.2+ and any MySQL
version. And, it works with MySQL 5.6 that has built-in
GTIDs. PECL/mysqlnd_ms 1.4 can either wait for a
slave to catch up or search all slaves before
considering the master. The wait effectively means
throttling slaves to reduce slave lag!

83.
The speaker says...
Configuring PECL/mysqlnd_ms for use with a MySQL
Replication cluster and calling mysqlnd_ms_set_qos($conn,
MYSQLND_MS_QOS_CONSISTENCY_STRONG)
instructs PECL/mysqlnd_ms to use only the MySQL
Replication master server for requests.
In a lazy primary copy system there is only one node that is
guaranteed to have all comitted updates: the primary.
Please note, that its possible to achieve higher consistency
levels than eventual consistency in an lazy primary copy
system by appropriately choosing nodes.