Wednesday, February 03, 2010

In the previous
post I described the first steps of a Python library for
controlling the replication of large installations. The intention of
the library is to provide a uniform interface to such installations
and that will allow procedures for handling various situations to be
written in a uniform language.

For the library to be useful, it is necessary to support installations
that use different operating systems for the machines, as well as
different versions of the servers. Specifically, it is necessary to
allow some aspects of the system to vary.

Depending on the operating system, or even just how the server
is installed on the machine, the procedures for bringing the server
down and up will differ.

Configurations are managed different ways depending on the
deployment and there are various other tools to manage
configurations of large systems.

As part of the management of the topology, it is necessary to
change the configuration files, but this should play well with
other tools.

In either case, any specific method for configuration handling
should neither be required nor enforced.

In the example in the previous
article, the technique for cloning a server was demonstrated. In
this case the naive method of copying the database files was
used. For the general case, however, some backup method
will be used, but it depends on the requirements of the
deployment. In other words, it is necessary to parameterize the
backup method as well.

Each server in the system has a specific role to
fulfill. Some server are final slaves whose only purpose is to
answer queries, at least one server is a master, and some servers
are relay servers.

To allow the system to be parameterized on these aspects, a set of
abstract classes is introduced. In the figure you can see a UML
diagram describing the high-level architecture of the Replicant
library.

In the figure, there are four abstract classes:

Machine

The responsibility of this class it to handle all issues that
are specific to the remote operating system, for example, to fetch
files or issue commands to start and stop the server.

Config

The responsibility of this class is to maintain the
configuration of a server. To do this, it may need to parse
configuration files to be able to extract the specific section
containing the definition.

BackupMethod

The responsibility of this class is to provide the primitives to
create a backup and restore a backup. In both cases, the class
supports taking a backup and potentially placing the backup image at
a different machine, and restoring it.

Role

The responsibility of this class is to provide all the
information necessary to configure a server in a role. Since the
role does not only entails pure configuration information, but can
also involve keeping certain tables and other database objects
available, this is modeled as a separate class.

The central Server class relies on a Machine
instance and a Config instance to implement the interface
to the machine and to the configuration, respectively.

Configuration Management

The configuration of the server is made part of the Replicant library
since manipulating the server configuration is usually necessary when
changing roles of servers.

Depending on the deployment, other configuration managers such as
cfengine or puppet are used to
administer the configuration of all servers, while others hand-edit
the configuration files (which has to be for small configurations,
since it would be a pain to administer larger deployments in this
way).

Long-term, there should be support for some safety measures when
working with server configurations, so implementing an interface for
handling server configurations in a safe transaction-like
manner—or maybe this should be called a RCU-style
manner—seems like a good idea. To support that, the following
methods to fetch and replace configurations are introduced.

Server.fetch_config()

Returns a Config instance of the configuration for
the server.

Server.replace_config(config)

Replace the configuration of the server with the modified
configuration instance config.

This will allow an implementation to keep version numbers around to
avoid conflicts, but is not required by the interface.

Each Config instance can then be manipulated by using
the following methods:

Config.get(option)

Get the value of option as a string.

Config.set(option[, value])

Set the value of option to value. If no
value is supplied, None is used, which
denotes that the option is set but not given a specific string
value.

Config.remove(option)

Remove the option from the configuration instance
entirely.

So, for example, the log-bin option can be set in the
following manner:

Machines

A MySQL server can run on many different machines and in many
setups. A server can run on Linux, Solaris, or Windows, and even in
those cases, there can be multiple servers on a single machine.

For a Linux machine with a single server, one usually uses the
script /etc/init.d/mysql to start and stop the
server—at least on my Ubuntu—but if multiple servers are
used on a single machine, then mysqld_multi should be
used instead.

For Windows and Solaris, the procedure for starting and stopping
servers are entirely different. Windows starts and stops the servers
using net start MySQL and net stop MySQL,
while Solaris uses the svcadm(1M)

To parameterize the system over the various ways it can be
installed, the concept of a Machine is introduced (I actually
had problems figuring out a name for this, but this was suggested to
me and seems to be good enough).

The responsibility of the Machine class is to provide
an interface to access the installed server together with installation
information such as the location of configuration files.

BackupMethod

One of the more important techniques when managing a set of server is
the ability to clone a slave or a master to create new slaves. Cloning
involves taking a backup of a server and then restoring the backup
image on a the new slave. Since the techniques for taking backups vary
a lot and different techniques will be used in different situations,
parameterizing over the various backup methods is sensible.

BackupMethod.backup_to(server, url)

This method will take a backup of server and store it
at the location indicated by url.

BackupMethod.restore_from(server, url)

This method will restore the backup image indicated by
url into server.

Role

In a deployment, each server is configured to play a specific
role. It can either be acting as a master, a slave, or even a
relay. To represent a role, a separate Role class is
introduced. Once a role is created, a server can be imbued
with it.

Not every server have an assigned role.

Each server can just have a single role.

Each roles can be assigned to multiple servers.

Since a role may encompass much more than just setting some
configuration parameters, this more flexible approach was chosen.
When imbuing a server with a role, a piece of Python code is executed
to configure the server correctly.

The use of roles in this case is actually just one of many choices,
and when using this approach, there is actually two different ways
that roles can be used. I am slightly undecided on the two and would
like to hear comments on which one to use.

Roles are just applied to the initial deployment and does not
play any role after the system have been deployed. Roles are imbued
into a server initially, and then the configuration of the server
can be changed by procedures to manipulate the deployment.

Roles exists in the entire deployment and when a server changes
roles in the deployment, the Role instance will also change. Every
server is assigned a role in the system, which is represented using
a subclass of the Role class.

The first is by far the easiest to implement, which is why I chose
this at this time. Since the roles are just containers for
configuration options and other items that needs to be added, they are
easy to write. Since this is what is used in the library currently, it
is also what you see in the class design above.

The second approach seems better, but it has a number of
consequences:

Every server has to have a role class associated with it, even
the "initial" role is required.

If the role changes, another role class will be associated with
it. This forces the role class to not only be able to imbue a server
in a role, but to also unimbue the server from that
role.

It cannot be possible to change the configuration of a server
directly, it has to be in the form of defining a role and then
changing the server to that role. Unimbuing the server from a role
becomes very hard if the configuration of the server is changed
outside the control of the role.