Archive for May 27, 2014

MySQL Fabric is a new framework that automates High Availability (HA) and/or sharding (scaling-out) for MySQL and it has just been declared Generally Available.

This post focuses on MySQL Fabric as a whole – both High Availability and scaling out (sharding). It starts with an introductions to HA and scaling out (by partitioning/sharding data) and how MySQL Fabric achieves it before going on to work through a full example of deploying HA with MySQL Fabric and then adding sharding on top.

This post focuses on MySQL Fabric as a whole – both High Availability and scaling out (sharding). It starts with introductions to HA and scaling out (by partitioning/sharding data) and how MySQL Fabric achieves it before going on to work through a full example of deploying HA with MySQL Fabric and then adding sharding on top.

What MySQL Fabric Provides

MySQL Fabric is built around an extensible framework for managing farms of MySQL Servers. Currently two features have been implemented – High Availability and scaling out using data sharding. Either of these features can be used in isolation or in combination.

Both features are implemented in two layers:

The mysqlfabric process which processes any management requests – whether received through the mysqlfabric command-line-interface (the manual for which can be found at http://dev.mysql.com/doc/mysql-utilities/1.4/en/fabric.html) or from another process via the supplied XML/RPC interface. When using the HA feature, this process can also be made responsible for monitoring the master server and initiating failover to promote a slave to be the new master should it fail. The state of the server farm is held in the state store (a MySQL database) and the mysqlfabric process is responsible for providing the state and routing information to the connectors.

MySQL Connectors are used by the application code to access the database(s), converting instructions from a specific programming language to the MySQL wire protocol, which is used to communicate with the MySQL Server processes. A ‘Fabric-aware’ connector stores a cache of the routing information that it has received from the mysqlfabric process and then uses that information to send transactions or queries to the correct MySQL Server. Currently the three supported Fabric-aware MySQL connectors are for PHP, Python and Java (and in turn the Doctrine and Hibernate Object-Relational Mapping frameworks). This approach means that the latency and potential bottleneck of sending all requests via a proxy can be avoided.

High Availability

High Availability (HA) refers to the ability for a system to provide continuous service – a system is available while that service can be utilized. The level of availability is often expressed in terms of the “number of nines” – for example, a HA level of 99.999% means that the service can be used for 99.999% of the time, in other words, on average, the service is only unavailable for 5.25 minutes per year (and that includes all scheduled as well as unscheduled down-time).

Different Points of High Availability

The figure shows the different layers in the system that need to be available for service to be provided.

At the bottom is the data that the service relies on. Obviously, if that data is lost then the service cannot function correctly and so it’s important to make sure that there is at least one extra copy of that data. This data can be duplicated at the storage layer itself but with MySQL, it’s most commonly replicated by the layer above – the MySQL Server using MySQL Replication. The MySQL Server provides access to the data – there is no point in the data being there if you can’t get at it! It’s a common misconception that having redundancy at these two levels is enough to have a HA system but you also need to look at the system from the top-down.

To have a HA service, there needs to be redundancy at the application layer; in itself this is very straight-forward, just load balance all of the service requests over a pool of application servers which are all running the same application logic. If the service were something as simple as a random number generator then this would be fine but most useful applications need to access data and as soon as you move beyond a single database server (for example because it needs to be HA) then a way is needed to connect the application server to the correct data source. In a HA system, the routing isn’t a static function, if one database server should fail (or be taken down for maintenance) the application should be directed instead to an alternate database. Some HA systems implement this routing function by introducing a proxy process between the application and the database servers; others use a virtual IP address which can be migrated to the correct server. When using MySQL Fabric, this routing function is implemented within the Fabric-aware MySQL connector library that’s used by the application server processes.

What MySQL Fabric Adds in Terms of High Availability

MySQL Fabric has the concept of a HA group which is a pool of two or more MySQL Servers; at any point in time, one of those servers is the Primary (MySQL Replication master) and the others are Secondaries (MySQL Replication slaves). The role of a HA group is to ensure that access to the data held within that group is always available.

While MySQL Replication allows the data to be made safe by duplicating it, for a HA solution two extra components are needed and MySQL Fabric provides these:

Failure detection and promotion – the MySQL Fabric process monitors the Primary within the HA group and should that server fail then it selects one of the Secondaries and promotes it to be the Primary (with all of the other slaves in the HA group then receiving updates from the new master). Note that the connectors can inform MySQL Fabric when they observe a problem with the Primary and the MySQL Fabric process uses that information as part of its decision making process surrounding the state of the servers in the farm.

Routing of database requests – When MySQL Fabric promotes the new Primary, it updates the state store and notifies the connectors so that they can refresh their caches with the updated routing information. In this way, the application does not need to be aware that the topology has changed and that writes need to be sent to a different destination.

Scaling Out – Sharding

When nearing the capacity or write performance limit of a single MySQL Server (or HA group), MySQL Fabric can be used to scale-out the database servers by partitioning the data across multiple MySQL Server “groups”. Note that a group could contain a single MySQL Server or it could be a HA group.

The administrator defines how data should be partitioned/sharded between these servers; this is done by creating shard mappings. A shard mapping applies to a set of tables and for each table the administrator specifies which column from those tables should be used as a shard key (the shard key will subsequently be used by MySQL Fabric to calculate which shard a specific row from one of those tables should be part of). Because all of these tables use the same shard key and mapping, the use of the same column value in those tables will result in those rows being in the same shard – allowing a single transaction to access all of them. For example, if using the subscriber-id column from multiple tables then all of the data for a specific subscriber will be in the same shard. The administrator then defines how that shard key should be used to calculate the shard number:

HASH: A hash function is run on the shard key to generate the shard number. If values held in the column used as the sharding key don’t tend to have too many repeated values then this should result in an even partitioning of rows across the shards.

RANGE: The administrator defines an explicit mapping between ranges of values for the sharding key and shards. This gives maximum control to the user of how data is partitioned and which rows should be co-located.

When the application needs to access the sharded database, it sets a property for the connection that specifies the sharding key – the Fabric-aware connector will then apply the correct range or hash mapping and route the transaction to the correct shard.

If further shards/groups are needed then MySQL Fabric can split an existing shard into two and then update the state-store and the caches of routing data held by the connectors. Similarly, a shard can be moved from one HA group to another.

Note that a single transaction or query can only access a single shard and so it is important to select shard keys based on an understanding of the data and the application’s access patterns. It doesn’t always make sense to shard all tables as some may be relatively small and having their full contents available in each group can be beneficial given the rule about no cross-shard queries. These global tables are written to a ‘global group’ and any additions or changes to data in those tables are automatically replicated to all of the other groups. Schema changes are also made to the global group and replicated to all of the others to ensure consistency.

To get the best mapping, it may also be necessary to modify the schema if there isn’t already a ‘natural choice’ for the sharding keys.

Worked Example

The example that this post steps through starts by setting up MySQL Fabric itself and then uses it to manage HA using a group of MySQL Servers. An example application will store data in this new configuration. After that, the example will introduce shards to the server farm in order to scale out capacity and read/write performance.

The following sections set up the sharded MySQL configuration shown here before running some (Python) code against – with queries and transactions routed to the correct MySQL Server.

Building the MySQL Fabric Framework

Note that this section is repeated from the earlier two posts and so can be skipped if you’ve already worked through one of them.

The machines being used already have MySQL 5.6 installed (though in a custom location) and so the only software pre-requisite is to install the MySQL connector for Python from the connector download page and MySQL Fabric (part of MySQL Utilities) from the MySQL Fabric download page:

MySQL Fabric needs access to a MySQL Database to store state and routing information for the farm of servers; if there isn’t already a running MySQL Server instance that can be used for this then it’s simple to set one up:

MySQL Fabric needs to be able to access this state store and so a dedicated user is created (note that the fabric database hasn’t yet been created – that will be done soon using the mysqlfabric command):

All of the management requests that we make for MySQL Fabric will be issued via the mysqlfabric command. This command is documented in the MySQL Fabric User Guide but sub-commands can be viewed from the terminal using the list-commands option:

[mysql@fab1 ~]$ mysqlfabric help commands
group activate Activate failure detector for server(s) in a group.
group description Update group's description.
group deactivate Deactivate failure detector for server(s) in a
group.
group create Create a group.
group remove Remove a server from a group.
group add Add a server into group.
group health Check if any server within a group has failed and report health information.
group lookup_servers Return information on existing server(s) in a group.
group destroy Remove a group.
group demote Demote the current master if there is one.
group promote Promote a server into master.
group lookup_groups Return information on existing group(s).
dump fabric_nodes Return a list of Fabric servers.
dump shard_index Return information about the index for all mappings matching any of the patterns provided.
dump sharding_information Return all the sharding information about the tables passed as patterns.
dump servers Return information about servers.
dump shard_tables Return information about all tables belonging to mappings matching any of the provided patterns.
dump shard_maps Return information about all shard mappings matching any of the provided patterns.
manage teardown Teardown Fabric Storage System.
manage stop Stop the Fabric server.
manage setup Setup Fabric Storage System.
manage ping Check whether Fabric server is running or not.
manage start Start the Fabric server.
manage logging_level Set logging level.
server set_weight Set a server's weight.
server lookup_uuid Return server's uuid.
server set_mode Set a server's mode.
server set_status Set a server's status.
role list List roles and associated permissions
user add Add a new Fabric user.
user password Change password of a Fabric user.
user list List users and their roles
user roles Change roles for a Fabric user * protocol: Protocol of the user (for example 'xmlrpc') * roles: Comma separated list of roles, IDs or names (see `role list`)
user delete Delete a Fabric user.
threat report_error Report a server error.
threat report_failure Report with certantity that a server has failed or is unreachable.
sharding list_definitions Lists all the shard mapping definitions.
sharding remove_definition Remove the shard mapping definition represented by the Shard Mapping ID.
sharding move_shard Move the shard represented by the shard_id to the destination group.
sharding disable_shard Disable a shard.
sharding remove_table Remove the shard mapping represented by the Shard Mapping object.
sharding split_shard Split the shard represented by the shard_id into the destination group.
sharding create_definition Define a shard mapping.
sharding add_shard Add a shard.
sharding add_table Add a table to a shard mapping.
sharding lookup_table Fetch the shard specification mapping for the given table
sharding enable_shard Enable a shard.
sharding remove_shard Remove a Shard.
sharding list_tables Returns all the shard mappings of a particular sharding_type.
sharding prune_shard Given the table name prune the tables according to the defined sharding specification for the table.
sharding lookup_servers Lookup a shard based on the give sharding key.
event trigger Trigger an event.
event wait_for_procedures Wait until procedures, which are identified through their uuid in a list and separated by comma, finish
their execution.

MySQL Fabric has its own configuration file (note that its location can vary depending on your platform and how MySQL Utilities were installed). The contents of this configuration file should be reviewed before starting the MySQL Fabric process (in this case, the mysqldump_program and mysqlclient_program settings needed to be changed as MySQL was installed in a user’s directory) and the MySQL Fabric management port was changed to 8080 and authentication for the management interface was disabled:

Adding MySQL Servers to Create a HA Farm

At this point, MySQL Fabric is up and running but it has no MySQL Servers to manage. This figure shows the what the configuration will look like once MySQL Servers have been added to create a HA server farm.

Three MySQL Servers will make up the managed HA group – each running on a different machine – these are the configuration files for each (there’s no detailed commentary as this is standard MySQL stuff):

At this point, the MySQL Fabric process (and its associate state store) is up and running, as are the MySQL Servers that will become part of the HA group. MySQL Fabric is now able to access and manipulate those MySQL Servers and so they can now be added to a HA group called group_id-1.

Note that all of the MySQL Servers are reported as being Secondaries (in other words, none of them is acting as the MySQL Replication master). The next step is to promote one of the servers to be the Primary; in this case the uuid of the server we want to promote is provided but that isn’t required – in which case MySQL Fabric will select one.

At this stage, the MySQL replication relationship is configured and running but there isn’t yet High Availability as MySQL Fabric is not monitoring the state of the servers – the final configuration step fixes that:

Everything is now set up to detect if the Primary (master) should fail and in the event that it does, promote one of the Secondaries to be the new Primary. If using one of the MySQL Fabric-aware connectors (initially PHP, Python and Java) then that failover can be transparent to the application.

Run an Application Against the HA Farm

The code that follows shows how an application can accesses the new HA group – in this case, using the Python connector. First an application table is created:

The connector is provided with the address for the MySQL Fabric process localhost:32274 rather than any of the MySQL Servers

The mode property for the connection is set to fabric.MODE_READWRITE which the connector will interpret as meaning that the transaction should be sent to the Primary (as that’s where all writes must be executed so that they can be replicated to the Secondaries)

The group property is set to group_id-1 which is the name that was given to the single HA Group

This code can now be executed and then a check made on one of the Secondaries that the table creation has indeed been replicated from the Primary.

And then the data can be retrieved (note that the mode parameter for the connection is set to fabric.MODE_READONLY and so the connector knows that it can load balance the requests across any MySQL Servers in the HA Group).

Note that if the Secondary servers don’t all have the same performance then you can skew the ratio for how many reads are sent to each one using the mysqlfabric server set_weight command – specifying a value between 0 and 1 (default is 1 for all servers). Additionally, the mysqlfabric server set_mode command can be used to specify if the Primary should receive some of the reads (READ_WRITE) or only writes (WRITE_ONLY).

The next section describes how this configuration can be extended to add scalability by sharding the table data (and it can be skipped if that isn’t needed).

Adding Scale-Out with Sharding

The example in this section builds upon the previous one by adding more servers in order to scale out the capacity and read/write performance of the database. The first step is to create a new group (which is named global-group in this example) – the Global Group is a special HA group that performs two critical functions:

Any data schema changes are applied to the Global Group and from there they will be replicated to each of the other HA Groups

If there are tables that contain data that should be replicated to all HA groups (rather than sharded) then any inserts, updates or deletes will be made on the Global Group and then replicated to the others. Those tables are referred to as global tables.

The following figure illustrates what the configuration will look like once the Global Group has been created.

The global group will contain three MySQL Servers running on the same host (in a product environment they would be split over multiple machines) and so the first step is to configure, bootstrap and start those servers.

A shard mapping is an entity that is used to define how certain tables should be sharded between a set of HA groups. It is possible to have multiple shard mappings but in this example, only one will be used. When defining the shard mapping, there are two key parameters:

The type of mapping – can be either HASH or RANGE

The global group that will be used

The commands that follow define the mapping and identify the index number assigned to this mapping (in this example – 3) – in fact that same index is recovered in two different ways: using the mysqlfabric command and then reading the data directly from the state store:

The next step is to define what columns from which tables should be used as the sharding key (the value on which the HASH function is executed or is compared with the defined RANGEs). In this example, only one table is being sharded (the subscribers table with the sub_no column being used as the sharding key) but the command can simply be re-executed for further tables. Note that the identifier for the shard mapping (1) is passed on the command-line:

At this point, the database has technically been sharded but of course it offers no scalability as there is only a single shard. The steps that follow evolve that configuration into one containing two shards as shown in the following figure.

Another HA group (group_id-2) is created, from three newly created MySQL Servers then one of the servers is promoted to be the Primary:

At this point, the new HA group exists but is missing the application schema and data. Before allocating a shard to the group, a reset master needs to be executed on the Primary for the group (this is required because changes have already been made on that server – if nothing else, to grant permissions for one or more users to connect remotely). The mysqlfabric group lookup_server command is used to first check which of the three servers is currently the Primary.

Before looking at the application code changes that are needed to cope with the sharded data, a simple test can be run to confirm that the table’s existing data has indeed been split between the two shards:

The next example Python code adds some new rows to the subscribers table. Note that the tables property for the connection is set to test.subscribers and the key to the value of the sub_no column for that table – this is enough information for the Fabric-aware connector to choose the correct shard/HA group and then the fact that the mode property is set to fabric.MODE_READWRITE further tells the connector that the transaction should be sent to the Primary within that HA group.

Example Application Code (Includes Sharding)

The final example application code reads the row for each of the records that have been added, the key thing to note here is that the mode property for the connection has been set to fabric.MODE_READONLY so that the Fabric-aware Python connector knows that it can load balance requests over the Secondaries within the HA groups rather than sending everything to the Primary.

Current Limitations

The initial version of MySQL Fabric is designed to be simple, robust and able to scale to thousands of MySQL Servers. This approach means that this version has a number of limitations, which are described here:

Sharding is not completely transparent to the application. While the application need not be aware of which server stores a set of rows and it doesn’t need to be concerned when that data is moved, it does need to provide the sharding key when accessing the database.

All transactions and queries need to be limited in scope to the rows held in a single shard, together with the global (non-sharded) tables. For example, Joins involving multiple shards are not supported.

Because the connectors perform the routing function, the extra latency involved in proxy-based solutions is avoided but it does mean that Fabric-aware connectors are required – at the time of writing these exist for PHP, Python and Java

The MySQL Fabric process itself is not fault-tolerant and must be restarted in the event of it failing. Note that this does not represent a single-point-of-failure for the server farm (HA and/or sharding) as the connectors are able to continue routing operations using their local caches while the MySQL Fabric process is unavailable.

MySQL Fabric Architecture & Extensibility

MySQL Fabric has been architected for extensibility at a number of levels. For example, in the first release the only option for implementing HA is based on MySQL Replication but in future releases we hope to add further options (for example, MySQL Cluster). We also hope to see completely new applications around the managing of farms of MySQL Servers – both from Oracle and the wider MySQL community.

The following diagram illustrates how new applications and protocols can be added using the pluggable framework.

Next Steps

We really hope that people try out MySQL Fabric and let us know how you get on; one way is to comment on this post, another is to post to the MySQL Fabric forum or if you think you’ve found a bug then raise a bug report.

On Thursday 22nd May I’ll be hosting a webinar explaining how you can get the best from the NoSQL world while still getting all of the benefits of a proven RDBMS. As always the webinar is free but please register here.

There’s often a lot of excitement around NoSQL Data Stores with the promise of simple access patterns, flexible schemas, scalability and High Availability. The downside can come in the form of losing ACID transactions, consistency, flexible queries and data integrity checks. What if you could have the best of both worlds?