I didn't put those values 11, 6, and 20 in there. Where did they come from and what are they?

They're the columns' "Display Width"

Well, for an integer type (the value in parentheses called the display width of the field. This is different from (and somewhat less intuitive than) the parenthesised value in character fields—such as VARCHAR(10)—where it's the maximum number of characters you can store in the field, and for floating types where it describes the total number of digits you can store. The display width for integers... well it doesn't seem to do much really, on the surface.

The c1 and c3 columns use explicit display widths. The c2 column uses the default display width, which is just enough to contain the largest amount of characters in a BIGINT (including the sign). The largest negative BIGINT is -9223372036854775808, which if you count carefully you'll see is 20 characters. For similar reason, the default display with of an INT (with largest negative value -2147483648, go on, count 'em) is 11, and so on.

Note that ZEROFILL implicitly makes the column unsigned so it cannot store negative numbers. You couldn't, for example, see a number like -000123 in such a column.

The ZEROFILL option fills up the return value with zeros, as you might have guessed. It turns numbers like 123 into numbers like 000123, assuming your display width is 6. Let's see what it means in our table:

So it should be clear that the display width does not limit the amount of values you store in the column—you still get 123456 in a column with a display width of 5—but simply put, it affects how the values appear when they're padded.

What's the point if it doesn't work without ZEROFILL?

Ah, but it does. Well, it does if you want it to. The mysql command-line client doesn't use the display width unless the field is full of zeroes, but other client applications can (and do). The display width is available to applications through the API, so they can use it to pad (with spaces, dashes, or whatever you like) the values.

This code reads the display width of the c3 column (which we know is 30, from the code above) from the column's metadata into a variable $c3length, and uses that value to provided a width to the format specifier %d (for decimal integer), so you get the expression %${c3length}d, which evaluates as %30d. This prints the value of $row['c3'] as an integer in a field 30 characters wide, right justified and space-padded.

The same sort of code exists in other languages; in Java, the java.sql.ResultSetMetaData interface provides the getColumnDisplaySize method, and Java's String.format method works similarly to the printf code above.

In short...

You can ignore the display widths if you don't use them. The defaults are enough to display any value compatible with the type, and won't cause you trouble if your application uses the display width without your knowledge.

If you need to zero-pad your numbers, display width lets you say how much padding you need, but you'll need to make sure your application deals with number larger than that width either by ensuring it can handle them, or that your business logic prevents them.

If you want to display space-padded numbers in plaintext reports or other fixed-width output formats and you want to store the column's display width with the other column metadata at the database (and where else would you put it?), use the display width.

Friday Jan 17, 2014

Many installations of MySQL server come with a built-in database called test. It's initially empty, and you might wonder what it's for, or even if you can delete it without any problems.

What is it for?

The test database is installed by the MySQL Server RPM as part of the mysql_install_db process, and some other package managers run that script too. If you run that script as part of a manual install of MySQL, you'll get the same effect. It creates the database by creating an empty directory called "test" in the data directory, and creates wide-open access to the database test and any database with a name beginning with test_ by inserting a couple of rows into the mysql.db table that give everyone full access to create or use those databases.

The configuration is designed to make it easy for new users to create a playground or sandbox database to work with, one that doesn't require asking the DBA to open up permissions every time a user wants to create a database for testing purposes.

The test security settings

These are the lines the mysql_install_db script executes (slightly modified):

This is quite significant, because if you create a database called test_db or test_banana or test_anything_else_really, all those databases are wide open to any MySQL user on that server, even otherwise low-privilege users. This is particularly important if you run a MySQL server shared across many projects or customers.

Securing the Default "test" Database Configuration

It's actually a best-practice to remove the test database along with the rows inserted by the lines above—they make the test database (and others beginning with test_) completely accessible to anyone with a MySQL account. When it's created during installation (as it is with the Oracle-built RPM distributions), anyone has full access to it as a sort of sandbox environment, and could in theory use it as a launching point for an attack.

Run the mysql_secure_installation script to perform a number of security optimizations, including removing the lines added by the mysql_install_db process. Another way is simply to delete the two rows:

Thursday Oct 24, 2013

You can keep up with the latest developments in MySQL software in a number of ways, including various blogs and other channels. However, for the most correct (if somewhat dry and factual) information, you can go directly to the source.

Major Releases

For every major release, the MySQL docs team creates and maintains a "nutshell" page containing the significant changes in that release. For the current GA release (whatever that is) you'll find it at this location:

At the moment, this redirects to the summary notes for MySQL 5.6. The notes for MySQL 5.7 are also available at that website, at the URL http://dev.mysql.com/doc/refman/5.7/en/mysql-nutshell.html, and when eventually that version goes GA, it will become the currently linked notes from the URL shown above.

Incremental Releases

For more detail on each incremental release, you can have a look at the release notes for each revision. For MySQL 5.6, the release notes are stored at the following location:

At the time I write this, the topmost entry is a link for MySQL 5.6.15. Each linked page shows the changes in that particular version, so if you are currently running 5.6.11 and are interested in what bugs were fixed in versions since then, you can look at each subsequent release and see all changes in glorious detail.

One really clever thing you can do with that site is do an advanced Google search to find exactly when a feature was released, and find out its release notes. By using the preceding link in a "site:" directive in Google, you can search only within those pages for an entry. For example, the following Google search shows pages within the release notes that reference the --slow-start-timeout option:

Thursday Oct 17, 2013

The SQL language is a bit like a toolkit for data. It consists of lots of little fiddly bits of syntax that, taken together, allow you to build complex edifices and return powerful results. For the uninitiated, the many tools can be quite confusing, and it's sometimes difficult to decide how to go about the process of building non-trivial queries, that is, queries that are more than a simple SELECT a, b FROM c;

A System for Building Queries

When you're building queries, you could use a system like the following:

Decide which fields contain the values you want to use in our output, and how you wish to alias those fields

Values you want to see in your output

Values you want to use in calculations . For example, to calculate margin on a product, you could calculate price - cost and give it the alias margin.

Values you want to filter with. For example, you might only want to see products that weigh more than 2Kg or that are blue. The weight or colour columns could contain that information.

Values you want to order by. For example you might want the most expensive products first, and the least last. You could use the price column in descending order to achieve that.

Assuming the fields you've picked in point 1 are in multiple tables, find the connections between those tables

Look for relationships between tables and identify the columns that implement those relationships. For example, The Orders table could have a CustomerID field referencing the same column in the Customers table.

Sometimes the problem doesn't use relationships but rests on a different field; sometimes the query is looking for a coincidence of fact rather than a foreign key constraint. For example you might have sales representatives who live in the same state as a customer; this information is normally not used in relationships, but if your query is for organizing events where sales representatives meet customers, it's useful in that query. In such a case you would record the names of columns at either end of such a connection.

Sometimes relationships require a bridge, a junction table that wasn't identified in point 1 above but is needed to connect tables you need; these are used in "many-to-many relationships". In these cases you need to record the columns in each table that connect to similar columns in other tables.

Construct a join or series of joins using the fields and tables identified in point 2 above. This becomes your FROM clause.

Filter using some of the fields in point 1 above. This becomes your WHERE clause.

Construct an ORDER BY clause using values from point 1 above that are relevant to the desired order of the output rows.

Project the result using the remainder of the fields in point 1 above. This becomes your SELECT clause.

A Worked Example

Let's say you want to query the world database to find a list of countries (with their capitals) and the change in GNP, using the difference between the GNP and GNPOld columns, and that you only want to see results for countries with a population greater than 100,000,000.

Using the system described above, we could do the following:

The Country.Name and City.Name columns contain the name of the country and city respectively.

The change in GNP comes from the calculation GNP - GNPOld. Both those columns are in the Country table. This calculation is also used to order the output, in descending order

To see only countries with a population greater than 100,000,000, you need the Population field of the Country table. There is also a Population field in the City table, so you'll need to specify the table name to disambiguate. You can also represent a number like 100 million as 100e6 instead of 100000000 to make it easier to read.

Because the fields come from the Country and City tables, you'll need to join them. There are two relationships between these tables: Each city is hosted within a country, and the city's CountryCode column identifies that country. Also, each country has a capital city, whose ID is contained within the country's Capital column. This latter relationship is the one to use, so the relevant columns and the condition that uses them is represented by the following FROM clause:

FROM Country JOIN City ON Country.Capital = City.ID

The statement should only return countries with a population greater than 100,000,000. Country.Population is the relevant column, so the WHERE clause becomes:

WHERE Country.Population > 100e6

To sort the result set in reverse order of difference in GNP, you could use either the calculation, or the position in the output (it's the third column):

ORDER BY GNP - GNPOld or ORDER BY 3

Finally, project the columns you wish to see by constructing the SELECT clause:

Queries with Aggregates and GROUP BY

While this system might work well for many queries, it doesn't cater for situations where you have complex summaries and aggregation. For aggregation, you'd start with choosing which columns to view in the output, but this time you'd construct them as aggregate expressions.

For example, you could look at the average population, or the count of distinct regions.You could also perform more complex aggregations, such as the average of GNP per head of population calculated as AVG(GNP/Population).

Having chosen the values to appear in the output, you must choose how to aggregate those values. A useful way to think about this is that every aggregate query is of the form X, Y per Z. The SELECT clause contains the expressions for X and Y, as already described, and Z becomes your GROUP BY clause. Ordinarily you would also include Z in the query so you see how you are grouping, so the output becomes Z, X, Y per Z.

As an example, consider the following, which shows a count of countries and the average population per continent:

In this case, X is the number of countries, Y is the average population, and Z is the continent. Of course, you could have more fields in the SELECT clause, and more fields in the GROUP BY clause as you require. You would also normally alias columns to make the output more suited to your requirements.

More Complex Queries

Queries can get considerably more interesting than this. You could also add joins and other expressions to your aggregate query, as in the earlier part of this post. You could have more complex conditions in the WHERE clause. Similarly, you could use queries such as these in subqueries of yet more complex super-queries. Each technique becomes another tool in your toolbox, until before you know it you're writing queries across 15 tables that take two pages to write out. But that's for another day...

Each certification requires passing a single exam, and because they're new, they're currently going through a beta process and are much cheaper than usual, a snip at €39 ($50). Even so, you still get the full certification on passing the exam; there's no difference between a certification obtained during the beta process and afterward. Because it's in beta, the exams have more questions—this is to make sure that the beta process works and all questions get a good airing—and you'll need to wait a bit longer to get your results due to the internal analysis.

Thursday Jul 04, 2013

If you're setting up MySQL Cluster on Oracle Linux or another Linux such as Red Hat, CentOS or Fedora, you might have come across some problems getting the nodes to communicate. There are two ways you might bump into problems with network connectivity: The iptables firewall, and SELinux. These security mechanisms might prevent your various nodes—management, data, and API—from communicating with each other in various ways and with various symptoms.

Let's have a look at what you're likely to see.

Data nodes stuck at "starting"

The first thing you might notice is that your data nodes get stuck in the "starting" state. Running show in the management client gives something like this:

Just to make matters worse, the first data node might start, but the second gets stuck at this point:

[ndbd] INFO -- Start phase 0 completed

What might confuse you is that it's obvious that both data nodes have connected to the management node, but they're not talking to each other. The reason for this is grounded in how the various nodes communicate with data nodes.

Data Node Communication

Data nodes communicate with the management node on port 1186. This port is registered with IANA, and is often open on system that have MySQL installed. Once a data node launches and has been given a node ID, it gets a unique dynamically allocated port—an "ephemeral port"—on its host, and opens a socket so other hosts can communicate with it. This port is assigned by the operating system, and as a result cannot be predicted by the data node; the IANA suggest a port range of 49152–65535, but in Linux the range is from 32768 to 61000. You can get this range with the following command:

$ cat /proc/sys/net/ipv4/ip_local_port_range
3276861000

This is where the problem lies: Firewalls routinely block incoming traffic on those high ports, so while each data node can open a connection to the management node, they can't open connections to other data nodes if those nodes are listening on sockets that are blocked by the firewall. MySQL Cluster won't declare the cluster started until all data nodes have connected (unless you use --nowait-nodes, and in general you shouldn't), so they get stuck in "starting" until they can talk to other data nodes.

iptables and ServerPort

Many Linux servers use iptables as a firewall. To open the range of ephemeral ports in the Linux iptables firewall, use something like the following:

Make sure to use ports that are unique to that host, and are unlikely to be used by other services.

Once you've done that, open those ports in the data node hosts' firewalls. To match the example shown above, open port 50501 on datahost1 and port 50502 on datahost2. For example, the following command, when run as root on datahost1, opens its socket:

iptables -I INPUT -p tcp --dport 50501 -s 192.168.56.0/24 -j ACCEPT

Once you've verified that this works, save your firewall rules with the following command (run as root) on each data node:

service iptables save

Stuck Management Nodes

Of course, it's also possible that your management node can't listen on port 1186; this is particularly possible on systems that you've installed by copying the binary from compressed archive, tar.gz or similar. If so, you might need to open that port too by using a similar technique.

For example, if you've configured two management servers, and port 1186 is not open on one of them, you'll see something like this when running show in the management client that cannot see the other:

ERROR Message: The cluster configuration is not yet confirmed
by all defined management servers. This management server
is still waiting for node 2 to connect.
Could not get configuration
* 4012: Failed to get configuration
* The cluster configuration is not yet confirmed by all
defined management servers. This management server is
still waiting for node 2 to connect.

If one management node has port 1186 open and the other closed, the one that can communicate with the other displays this:

iptables and PortNumber

In this situation, the one that appears to be working is actually the one with the blocked port; it can see the other, because the other's port is available. Again, a quick blast of iptables will fix the problem:

iptables -I INPUT -p tcp --dport 1186 -s 192.168.56.0/24 -j ACCEPT

You can, of course, change the management node's port. The relevant option is PortNumber, although bear in mind that 1186 is a registered port, so changing it requires changing not only your iptables Tables configuration, but also SELinux and any application connect strings you use.

MySQL Server Node failing to connect

When you install MySQL from RPM—either MySQL Server or MySQL Cluster—the installer adds SELinux configuration to the machine. The default configuration allows MySQL's server process mysqld to talk to ports 3306 and 1186. These are the ports of the mysqld process and the MySQL Cluster management node respectively.

However, the default SELinux configuration doesn't know anything about MySQL acting as an API node in a MySQL Cluster, so doesn't configure any permissions that allow the mysqld process to access the data nodes' dynamic ports.

This typically manifests itself as the data node appearing to connect, but running show displays something like this:

SELinux and MySQL Cluster SQL Nodes

This all looks pretty disastrous, but is easily fixed: You can fix it by tweaking SELinux a bit. I've written about SELinux and MySQL in the past, and the same advice in that article applies here. You could, for example, disable SELinux entirely. But, being a more intelligent reader, you're more likely to want to know how to configure it.

Once you've set ServerPort as shown above, you can easily let SELinux in on that particular secret and permit the mysqld process to access the ports you've assigned by running the following command as root on the MySQL host:

semanage port -a -t mysqld_port_t -p tcp 50501-50524

This opens 24 ports starting with the two in the configuration example shown earlier. Having done this, your MySQL Server instance can talk to data nodes on their fixed ports.

Other Connections

A lot of what's said in this post also applies to other connections. If you've got a Java application that uses ClusterJPA to talk to your cluster, you'd better make sure it can see the management nodes and that SELinux isn't preventing it from accessing the data nodes.

Similarly, if you've changed your management node's port from the default, ensure you've configured the connection string correctly, with that port, in all of your applications' connect strings. This includes MySQL Server applications where you'll need to include the revised port number in the ndb-connectstring option.

You don't need to tell your applications what hosts or ports your data nodes are at. That's the joy of a centralized configuration, isn't it?

Thursday May 30, 2013

I got a question a while ago that I thought was quite simple, but turned into an interesting discussion: How much hard disk space does a database take up?

As it happens, there's a simple answer and a much, much more involved (yet ill-defined) answer, and which one you choose depends on what you think goes into a database and how very, very pedantic you are.

The Simple Answer

You can query the size of table and index data using Information Schema or SHOW TABLE STATUS. The columns DATA_LENGTH and INDEX_LENGTH contain the number of bytes for the table and index data respectively. You can construct simple queries based on that data, or do a quick search on the web for more interesting queries (For example, Peter Zaitsev of Percona wrote a post on Researching your MySQL Table Sizes). Adding up the totals for all tables in a database gives you a figure that shows how big the data rows and index contents for the database are.

This really is where you should stop.

Let's Get Slightly Pedantic: Taking the File System's Perspective

In MySQL, databases are little more than file folders, so they don't consume space themselves. The space is consumed by the data within, specifically tables and indexes: Views, stored routines, triggers are little more than command text and so take a trivial amount of space, so for now we can ignore them.

In MySQL 5.6, table and index data are stored in individual files in the appropriate directory. So, in my /var/lib/mysql/world_innodb folder, I've got City.ibd, Country.ibd, and CountryLanguage.ibd. To check the size of the database, I can simply read the size of that folder, which gives me the approximate size of the table and index data, along with any extra space allocated. The extra space comes from the fact that the file size is typically bigger than the amount of data stored within it, because InnoDB increases the file size by 64MB (8MB in 5.5) every time it fills up. You can change this value by setting the innodb_autoextend_increment option.

In MySQL 5.1 and 5.5, the default InnoDB configuration stores all InnoDB information in a single tablespace with a filename of (typically) /var/lib/mysql/ibdata1. By default, this file contains all table data and index information for all InnoDB tables in all databases, which means you cannot simply calculate the space taken on disk by the data of any single database. Although all InnoDB data defaults to a single tablespace file, you can change this by setting innodb_file_per_table, which is enabled by default in 5.6. By doing so, you store table and index data for each database in a file named for the table, in the appropriate database directory.

Let's Get Really Pedantic: Going Beyond the File System

The simple answer at the top really is quite simple: It only considers data, so if there is no data, there are no rows taking up space. This means that regardless of how many functions, procedures, triggers, views, tables, permissions, or anything else you've configured in your (otherwise empty) database, MySQL considers that it is empty, and therefore takes up 0 space. Of course, if you're quite pedantic about it, you'll know better.

What Else is in a Database?

MySQL does not allocate space per database, but per table and any indexes associated with that table. If every table in a database contain no data, then MySQL considers that the database contains no data. Queries on DATA_LENGTH and INDEX_LENGTH show information about data (as opposed to metadata), so for the purpose of that query, a database with no data takes up 0 space. This might be a quite simplistic way of looking at it, so let's consider the difference between this viewpoint—a database with no data takes up no space—and another—a database with objects, even empty ones, has to exist, and that must take up some space.

As well as table data and indexes, the other information MySQL maintains about a database is its metadata. Some of this metadata is contained within:

The file system: as a database directory as db.opt, .frm, and .trg files

The mysql database: stored routines, events, privileges, and other database metadata; these are stored in MyISAM tables, each of which contains information for all databases on the server

As the metadata is distributed in multiple locations, MySQL does not have a single location that you could point to and say "This is the database," so for the purposes of explanation, let's drill into my local copy of the sakila database.

Metadata on the File System

As an administrator, I can see that the structure of my sakila database takes up space in the form of the following file system artefacts:

The database directory inode takes up 4KB on my machine.

The db.opt file takes up 65 bytes (for my sakila database)

Triggers take just over 1KB each (TRG and TRN files)

Table metadata (.frm files) are about 8KB each

View metadata (.frm files) depend on the complexity of the view, but let's say 1KB each

On my system, these add up to 336KB for my slightly-extended copy of the sakila database.

Metadata in the mysql Database

There is also row data in mysql relevant to the sakila database:

The mysql.proc table contains 6 rows for sakila; with an average row length of 1116 bytes—from SHOW TABLE STATUS LIKE 'sakila';—that gives approximately 6KB.

The mysql.db table on my system has two rows on my system (I've set up a couple of users), at ~440 bytes each

The mysql.tables_priv table has one row at 851 bytes (unrepresentative, but included for detail)

I have no mysql.event, mysql.columns_priv, or mysql.procs_priv rows relevant to sakila at this time (again, unrepresentative, but included for detail).

Have We Considered Everything Yet?

This isn't even the whole story; I haven't included mysql.user records for users who only use this database, or log file entries relevant to the sakila database. You can take this as a sign that what constitutes a database is actually quite a fuzzy thing.

So, taking all of that together, you might consider that the metadata for my sakila database takes up ~344KB, but it should be apparent that it is quite a difficult task to define what is and isn't sakila metadata, and there is certainly no built-in way to arrive at this figure in a comprehensive and consistent way that would satisfy the most pedantic administrator.

Friday May 10, 2013

Replication is a hot topic in MySQL 5.6, and for good reason: There are many excellent features that make it a strong well-supported feature, from the new Global Transaction Identifiers (GTIDs), to simplified replication configuration and automated failover using MySQL Utilities (now available in alpha as a separate download).

Circular Replication

The simplest topology consists of a master server that accepts changes, and slaves that replicate those changes from the master. A common requirement is for a network to have multiple servers that accept changes and replicate to each other. This is possible by means of circular replication, where each master is also the slave of another master, in a circular fashion. However, this configuration is prone to certain problems.

Asynchronous Replication

Firstly, you have to know a bit about how replication works. MySQL replication is asynchronous, which means each server executes operations without waiting for another server to replicate them. It does this by logging every event and subsequently transmitting these events to connected slaves. This works perfectly—without conflicts—when you have a single master that accepts changes from client applications. That master can have any number of slaves that can execute queries for applications, but don't change the contents of the database.

Circular Replication

Circular replication enforces replication from only one other master, so MySQL avoids some sorts of conflicts such as time-order conflicts.

A time-order conflict occurs when masters can replicate directly from two or more sources, and conflicting event pairs replicate to masters in a different order. This problem cannot occur in MySQL replication.

Circular replication introduces the possibility that two or more servers can replicate concurrent changes to each other. It is therefore is subject to conflicts, because it is possible that two servers update the same row at the same (or nearly the same) time and that each subsequently replicates its changes to the other server. If the replicated changes apply to the same row, then that row ends up with a different value on each server.

MySQL does not perform conflict resolution in such situations, so you have to make sure your application caters for that possibility.

Conflicts

Conflicts in circular replication occur when your application allows updating the same data (rows and related rows) on different servers. If you do not take care to avoid conflicts, you risk a situation where two servers accept conflicting changes at nearly the same time, and replicate them to the other side.

For example, imagine a product that costs $520:

The "promotions" team updates it on the sales server by subtracting $50

At the same time, the "brands" team on the management server increases its price by 20%

Operation

Sales server

Management server

(0) Initial price

$520

$520

(1) Promotions update (subtract $50)

$470

(2) Brands update (increase by 20%)

$624

(3) Promotions update replicates

$574

(4) Brands update replicates

$564

(5) Final price

$564

$574

After each of the changes replicates to the other server—after being performed on its local server—the price on the sales server ends up being $564, and that on the management server (for the same product) is $574. There are no errors generated, because each server executes the statements in the order it receives them (either from a connected client application or a replicating master), which, due to the nature of asynchronous replication, is not always the same order on each server.

Further Examples

The preceding example describes a simple situation that could occur when you use statement-based replication. If you use row-based replication, you are not immune either, because there are several ways in which operations can conflict. Here are some further examples of conflicts that can occur whether you use statement- or row-based replication:

One server deletes several rows based on a WHERE clause that should include a row that has just changed on the other server - the UPDATE propagates in one direction and the DELETE in the other, resulting in a row that exists on one server but not the other.

One server updates a set of records at the same time as the other server updates an overlapping set

Two servers generate an identical report at exactly 18:00 each night, but one server executes an update statement immediately before the report, that replicates to the other server just after it generates its report.

In short, a conflict is likely to occur at some point if you allow changes to common data on two or more servers.

Avoiding Conflicts

The simplest way to avoid conflicts in a circular replication topology is to ensure that each master server can only update rows that no other master server changes. For example:

One master server updates rows within the sales table but not the products table, and the other updates products but not sales.

The stock table can be updated by one master during the day, when the warehouse is packing and dispatching; another updates overnight when deliveries arrive.

Masters can update each order based on the order's status. One master updates orders that have not yet shipped, another updates only shipped orders. Care must be taken that the order status changes in a controlled way, to avoid conflicts. For example, only the master with write access at that time (based on order status) can change its status; other masters use the order status to decide if they have write access at that time.

If you need to improve the performance of your replicated network, you can then scale out each master to multiple slaves to allow for greater bandwidth when running read-only queries. In practice, this means that you can load-balance queries across the slaves, and dedicate the master to handling writes for its tables.

Of course, to avoid conflicts all together, you should avoid circular replication and ensure that only one master accepts writes. But where would be the fun in that?

Thursday Apr 11, 2013

If you're considering using MySQL Cluster, you might be tempted to try it out on one of your existing databases. MySQL Cluster 7.3 (available as a development release at the time of this post) supports foreign keys, so it seems sensible to try migrating an InnoDB database with foreign keys to MySQL Cluster.

What won't work

For this example, I'm using the world_innodb.sql script available from the MySQL sample databases. It's a simple database with three tables: Country, City, and CountryLanguage. Both City and CountryLanguage have foreign keys referencing the Country table's Code field.

Sadly, this won't work; InnoDB won't let you convert a table from InnoDB to another engine if the table is at either end of a foreign key, to avoid dangling constraints. If you try to convert an InnoDB table to NDB using one of the above statements, this happens:

Now, the clever among you might be aware of the foreign_key_checks variable, which disables InnoDB's foreign key constraint checking when you turn it off. It's useful when you're importing data from a script and don't want to check constraints until the import has finished. However, you can't use it to switch off contraints when changing a table's storage engine; it's designed to be used temporarily, for data, so it won't allow dangling constraints. Similarly, the ndb_deferred_constraints variable can't be used for this purpose either, because it doesn't affect InnoDB's constraint protection.

So how do I do it?

There are two ways to migrate InnoDB tables with foreign keys to NDB.

Dump the database and edit the script so each table specifies ENGINE=NDB before re-importing the script to a new database

Drop the constraints, alter the tables to use the NDB engine, and recreate the constraints

Dumping the database and editing the script is a straightforward use of mysqldump and a text editor.

Dropping and recreating the constraints is more interesting, so I'll walk through that.

First, I use SHOW CREATE TABLE to see the constraints' definitions, and particularly their names:

Friday Apr 05, 2013

Problem: You've a large table (or two) in a database on a partition that's running out of space, and you want to see if you can move that table to another drive.

Solution: Well, several actually. No silver bullet, but several options, some with conditions and some that require preparation. Let's look at some background information first.

How MySQL Stores Data

OK, that's somewhat of an ambitious heading for an incidental paragraph or two, so to tone it back a bit, I'll summarise briefly.

The data directory is where MySQL stores databases, and it's set by the datadir server option. Each database is stored in a subdirectory of the data directory.

You can also save a considerable amount of space without moving data around, by using features of the various storage engines, for example if you enable compression on InnoDB tables. If you don't use InnoDB, you can consider using the ARCHIVE storage engine to compress data.

Storage engines are what MySQL uses to do the grunt work of storing and retrieving data. InnoDB is the default and is quite full-featured and robust, but some systems still use MyISAM, which is fast but not as well-featured or robust. Other storage engines are available.

MyISAM and InnoDB store data in the data directory.

MyISAM stores data in .MYD files and index information in .MYI files, both in the database subdirectories mentioned above.

InnoDB stores data and index information in .ibd files in the database subdirectories, when you have the innodb_file_per_table option enabled. This is the default in MySQL 5.6, but must be enabled in 5.5; when disabled, InnoDB stores all data and indexes in a shared tablespace in the root of the data directory.

Now you know where the data lives, it's time to look at how to move it somewhere else.

Moving the Whole Data Directory

As the datadir option controls where MySQL stores its data, one option we have is to move the contents of the current data directory to a new partition, and then change the datadir option to point to that location.

For example, on my machine, the datadir option points to the /var/lib/mysql directory. As expected, it contains a directory for each of my databases, and it also contains the InnoDB system tablespace files. To increase the space available, I could add a new drive and point the datadir there.

Consider the following scenario:

Add a new hard drive, partitioned using a suitable file system such as ext4. That file system is visible to my OS as /dev/sdf1

Beware of the various Mandatory Access Control subsystems that might affect you. If you've got AppArmor installed, you'll also need to change the profile for mysqld so it can access files in the new directory. If you've got SELinux installed, you'll need to add a context mapping for MySQL.

In fact, you don't even have to point MySQL to the new partition. If the mountain won't go to Mohammed... let's undo the last couple of steps:

Assume you haven't set the datadir option to point to /data, and that it still points to /var/lib/mysql. Also assume MySQL is still stopped. Now the data directory contents are at the new location, but on its next boot MySQL still looks in /var/lib/mysql.

Unmount the new partition /dev/sdf1 (or you can use its mountpoint, which is easier to type):

# umount /data

Mount /dev/sdf1 to /var/lib/mysql:

# mount /dev/sdf1 /var/lib/mysql

Restart MySQL

Assuming it all works, you should then modify the file /etc/fstab to automatically mount /dev/sdf1 to /var/lib/mysql on reboot, so persisting your new configuration

But I Only Want to Move One Database

Moving a whole database to another location is achieved by using symbolic links. Simple summary (read the link for more details):

Make the new partition as in the example above

Rather than move the whole data directory, this time just move a single database directory; this only works with innodb_file_per_table enabled, because otherwise the InnoDB data is in the shared tablespace and can't be moved

At this point you can either mount the new partition into the database directory, or use symlinks as described in the link above.

But I Only Want to Move One Table

Ah, now we're getting interesting. The technique for doing this depends on the version of MySQL you're running—I'll talk about 5.5 and 5.6—and what storage engine the table uses.

Firstly, in MySQL 5.5 you can use symbolic links only for tables that use the MyISAM storage engine, and only on certain operating systems such as Linux and Oracle Solaris. The technique is similar to that described above, although you can also use the DATA DIRECTORY and INDEX DIRECTORY options to CREATE TABLE when creating the tables if you want to set up the links from within MySQL rather than at the Linux command prompt.
You should not use symbolic links to InnoDB tables, because InnoDB stores some information about each table in the shared tablespace, even when you use the innodb_file_per_table option.

MySQL 5.6 also allows you to copy or move InnoDB tables to another server by performing a file copy, by using the transportable tablespaces feature.

In MySQL 5.6 you can also move InnoDB tables to a different location by using the DATA DIRECTORY option to CREATE TABLE. When you do so, the tablename.ibd file is moved to a subdirectory of that location named for the table's database, and a corresponding tablename.isl file created in the database's directory that acts as a link to the table's .ibd file. The .isl file is not an actual symbolic link, but is treated as such by MySQL.

Plenty of Space

All told, you have many options available to you regardless of which storage engine or version you use, although it won't surprise you to know that MySQL 5.6 gives more options than previous versions. Don't forget that modern UNIX-like operating systems often have mandatory access control systems such as AppArmor, SELinux, or Extended Policy, so be sure to do your homework before moving files around.

Extended Policy is a feature of Solaris that allows you to assign named privileges on resources—such as ports and files—to services. I'm not hugely familiar with Extended Policy (or Solaris for that matter), but according to Glenn it's similar to SELinux but somewhat better: He says "it doesn't need a knob to disable enforcement; nor does it require relabeling the filesystem to make the policy effective...we never need to inform the kernel that the policy is updated because the policy is maintained in each process credential, not in a system-wide kernel database."

I'll let him continue the explanation at his blog post (thanks Glenn!)

Friday Mar 22, 2013

I've previously written about AppArmor and MySQL, and how to change MySQL's default file locations on systems with AppArmor enabled. Ubuntu and SUSE ship with AppArmor enabled, but some other distributions such as Oracle Linux don't, along with related distrubutions such as Red Hat, CentOS and Fedora. Rather, these other distributions use another mandatory access control system called SELinux.

Here's some technical detail that might come in handy later.

SELinux uses concepts such as types and domains. Types belong to resources such as files and ports; these are the "objects" in SELinux. Domains contain the "subjects" (processes) and object types that are associated with each other in some way, for example because they are all related to MySQL.

Each executable for a service that SELinux understands, including MySQL, has its own type.
When the process runs, it is placed within a domain based on its type; members of that domain (normally processes) that wish to access objects (such as files and ports) must be authorised to do so by virtue of the object having a configured type that is approved for that domain.

This configuration is contained within policy files that usually ship with each distribution, but can be created using the appropriate tools.

SELinux (Security-Enhanced, if you're interested) "is a Linux feature that provides the mechanism for supporting access control security policies" according to Wikipedia. More simply, it stops things—like programs—from accessing things—like files and network ports—they shouldn't access. By "shouldn't access" I really mean "haven't been configured to access". For example, MySQL is allowed to write to its data directory in /var/lib/mysql, and read from /etc/my.cnf. It can open a socket on port 3306, but SELinux prevents it from writing to files in /home/jeremy or /sbin or anywhere else that isn't already configured as a MySQL location.

In short, if you try changing MySQL's port to a non-standard one, or try taking backups or configuring data files or log files to anywhere but the usual locations, you'll get some odd access-denied type errors in the MySQL error log. In addition, you'll get messages in /var/log/audit/audit.log (if auditd is running, otherwise /var/log/messages or /var/log/syslog, depending how your system is configured).

What Error do I get?

To set this demonstration up, I've installed MySQL 5.6 on an Oracle Linux 6.3 system, with SELinux enabled. When I change the datadir option to /datadir (which contains a copy of the MySQL data directory, and has the correct permissions) the service does not start. Let's look at the errors.

Clearly something going on here. The Access Vector Cache (as seen in the "avc: denied" message) is where SELinux caches permissions for the kernel, so it's definitely SELinux doing the denying.

Just Stop It!

I'm going to start with the hammer and work my way down to the scalpel.

Here's the hammer:

[root@boxy ~]# setenforce 0
[root@boxy ~]# getenforce
Permissive

The setenforce 0 command switches off SELinux enforcing until the next reboot, and getenforce shows you the current status. To stop SELinux from enforcing on any reboot, you'll need to change a configuration file:

[root@boxy ~]# cat /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=enforcing
# SELINUXTYPE= can take one of these two values:
# targeted - Targeted processes are protected,
# mls - Multi Level Security protection.
SELINUXTYPE=targeted

Change that enforcing to permissive (or disabled) and you're good to go. The difference:

enforcing blocks operations that SELinux does not allow

permissive does not block the operations, but logs them (to /var/log/audit/audit.log)

disabled switches off SELinux entirely, to the extent that you cannot use setenforce until you change it and reboot.

For example, on a machine with SELinux set to permissive, I can do the following:

If you're happy with that, you could then edit the configuration file to disable SELinux on next reboot, and thanks for reading. See you next time.

I'm intrigued. How do I configure it?

Obviously, there's a lot more to SELinux than disabling it, and a responsible admin (that's you, right?) wants to know how to use it rather than disable it. I'm not going to get into too much detail here.

We can, however, look at how you can assign SELinux types to objects such as ports and files, so that members of the mysqld_t domain (specifically the mysqld_safe process, launced when you run service mysql start) can access those objects.

So here's the scalpel. First, let's configure SELinux to enable MySQL's use of port 3307:

[root@boxy ~]# semanage port -a -t mysqld_port_t -p tcp 3307

You'll need to install the policycoreutils-python package to use the semanage utility.

The semanage utility changes various SELinux settings. In this case, it adds (-a) a type (-t mysqld_port_t) to the port mappings for port 3307 using TCP as its protocol (-p tcp). When MySQL (through the mysqld_safe process) tries to access that port, SELinux recognises that the port has a type that is approved for such access by the policy.

In this example, semanage is adding the type mysqld_db_t to the file context map (fcontext) for anything in the /datadir directory and subdirectories ("/datadir(/.*)?", a regular expression). File mappings such as this are contained in the file /etc/selinux/targeted/contexts/files/file_contexts.local; that file must subsequently be read in order to set the appropriate type on the file itself. That's done by the restorecon utility, and at system reboot. If you want to change a file context immediately, but don't need it to survive a reboot, there's a chcon utility that performs that task.

The same principles and statements apply if you wish to use other ports or directories. There are similar types that apply to different kinds of files; I used mysqld_db_t above for database directories, but the standard SELinux policy for MySQL also include:

mysqld_etc_t for configuration files such as /etc/my.cnf

mysqld_log_t for logfiles such as those named /var/log/mysql*

Types for the PID file, tmp files, service startup files in the /etc/init.d directory, and the various executables you're likely to use

As you can see, you can get quite fine-grained as you wield your configuration scalpel. Personally, I've had mixed results using things like mysqld_log_t for custom logfile locations, but I got around it initially by using mysqld_db_t (as for data files), and subsequently by using a custom-made policy file.

Conclusion

This post is already long enough, so I won't get into the deeper topics in SELinux, such as the ability to compile your own policy files and configure new policies for services that SELinux doesn't yet know about. At this stage, you know how to add an SELinux type to an object such as a port or a file so that MySQL can access that object, even if it's not used by default. You also know how to disable SELinux in a couple of ways, but you're not going to do that now, are you? You've got a perfectly good scalpel, after all. Why use a hammer?

Thursday Feb 07, 2013

One of our training courses has a section covering MySQL's Memcached API, and how it works. In the discussion, there's a line that goes like this:

"A key is similar to a primary key in a table, and a value is similar to a second column in the same table"

For someone well versed in database tables but not so much in key-value stores, that sentence might take a bit of grasping. So, let's break it down.

An Example Key/Value Store

Imagine the table kvstore has a column key and a column value. Also imagine that we've set up the Memcached plugin in MySQL and configured it to use that table and those columns as its store. I won't get into that bit for now, but trust me, it's not that hard.

Now imagine you want to be able to get at your values with a simple syntax like this:

set X Y ...get X

...or if you're using PHP (and have a MySQL memcached connection called $m), you might want something like this:

$Y = $m->get($X);$m->set($X, $Y, ...);

Note the simplicity of the later statements, and the absence of the words "kvstore", "key" or "value"; Once you've configured the Memcached plugin, you don't have to worry about those. Remember, it's a key/value store, so we just worry about the keys and the values and let the configuration take care of everything else.

The memcached API merely provides you with a different way of getting at your InnoDB data. Although I've greatly simplified the memcached protocol examples above, you can get the idea. Simply put, you can read and write values without having to construct entire SQL statements. Not only is this faster for you, the programmer, but it's also faster for MySQL.

Now, SQL is a much more expressive language, and lets you do things like aggregation and range-based processing, but at a cost: that the SQL parser, statement normalizer, and optimizer have to do their jobs regardless of whether it's a simple SELECT or a complex aggregated join with subqueries, whereas the memcached API just does one thing per statement, so you bypass all the extra CPU work that comes before the storage engine kicks in.

So, why bother?

An Example Use Case

MySQL's NoSQL feature is comparable with other key-value caches. These are often used on the web when you want to serve pieces of the page quickly, but the page as a whole differs between users.

Imagine you've got 100,000 users of a web page, and each one wants to see a different page when they're logged in. You want to see your login name and your current karma/kudos/rep at the top right, and I want to see my relevant information on mine. The navigation div at the top and on the common links on the left will be the same for both of us, but my list of "tags" (or subsectons or whatever other personal navigation shortcuts) will belong to me and yours to you. Then the body content is going to be the same (per article) but different (per page), so my front page will show articles 41, 42, 45, and yours will show articles 42, 43, 44, depending on our options. And the same goes for each of the other 100,000 users who visit the site and want their own personal content.

Now the page generation logic, rather than reading the entire navigation code and articles and navigation shortcuts and so on for each page, becomes a matter of asking the cache for the HTML values contained in the keys "user12345-header" and "common-nav" and "common-list" and "user12345-navshortcuts" and "article-41" and "article-42" and ... etc. So the page processing becomes a case of asking the cache for content rather than generating the content each time. This provides massive performance improvements over generating each page's content per page view, although you need to change your logic to catch cache-misses and generate the HTML code to put into the cache accordingly, and to handle session/cache timeouts.

MySQL's Advantage

In MySQL this is even better than many other key-value stores, because some cache-misses are automatically taken care of; with InnoDB as the backing store, you've two levels of "cache-hit", the first being when the page is already in the buffer pool, so you're getting the key's value from RAM (which is about as fast as existing key-value caches), but if the page isn't in the buffer pool, InnoDB automatically reads it from the table. This isn't anything new; it's how InnoDB already works with table data. With other key-value stores, if it's not already in RAM, you just get a cache-miss and your code has to generate the data. With InnoDB, it's got this effective second-level cache that stores stuff on disk when it's not in RAM.

Plus you get the benefit of other parts of the application being able to use the power of the SQL API (remember those SELECTs and subqueries and joins and aggregation and all the other fun stuff that key-value stores don't have?) to run reports on all your cache data. That's why they call it "Not only SQL". And you get this without having to do any ELT transfers or writing clever persistence logic for your cache. You get that bit for free.

Thursday Jan 31, 2013

MySQL accesses files in various places on the file system, and usually this isn't something to worry about. For example, in a standard MySQL 5.5 installation on Ubuntu, the data goes in /var/lib/mysql, and the socket is a file in /var/run/mysqld. It puts configuration files in /etc, logs and binaries in various locations, and it even needs to access some operating system files such as /etc/hosts.allow.

This is all very well until you start trying to be clever and get MySQL to access other parts of the file system. After all, you can configure the location of data, log files, socket, and so on, so why shouldn't you use those settings to optimize your system? Unfortunately, on many modern Linux distributions, it's not that always easy.

Take Ubuntu, for example. Ubuntu comes with something called AppArmor, a kernel-integrated application security system that controls how applications can access the file system. This goes above and beyond normal permissions, and as a result can sometimes be a bit confusing.

TL;DR

First, here's the quick version of this post: If you want to relocate the data directory in MySQL (in this example, to the /data/directory), and AppArmor is not letting you, add the following two lines to the bottom of /etc/apparmor.d/local/usr.sbin.mysqld:

/data/ r,
/data/** rwk,

...and then reload the AppArmor profiles:

# service apparmor reload

The Demonstration

To demonstrate this in a bit more detail, I've done the following:

Installed a stock MySQL 5.5 from the Ubuntu 12.04 repository

Created the /data directory, owned by the mysql user and group

Copied my data directory to /data with cp -a

Now, when I start MySQL with the new data directory, I get the following:

The final two lines say mysqld doesn't have access to the directory, even though I've changed the ownership (both user and group) to mysql. If you haven't come across AppArmor before, this is about where you start to get confused. However, that big "DENIED" is a bit of a giveaway, and it's associated with apparmor, so let's have a look at AppArmor's status:

There's a profile loaded for the mysqld process, which could be what's blocking it from accessing /data.

There are a couple of quick and dirty ways to get past this. You could, for example, disable AppArmor; it's a service, so you could uninstall it, or stop it with the special teardown command to unload all profiles. You could even delete the offending profile if you want rid of it. Another less extreme option is to use the apparmor-utils package, which contains the utilities aa-complain and aa-enforce that allow you to work with existing profiles without removing them or stopping AppArmor entirely:

Under the Covers: AppArmor's Policy Files

When you install MySQL on Ubuntu, it places an AppArmor policy file in /etc/apparmor.d/usr.sbin.mysqld. Another policy file gets placed in /etc/apparmor.d/local/usr.sbin.mysqld, which is initially empty (aside from comments) but exists to let you add non-standard policies such as those specific to this machine. In practice, you could add such policies to either file, but for now I'll put them in the local file. There's also a cached policy file, which is a binary compiled version of the policy. We can happily ignore that; it's automatically generated from the policy text files.

In the middle are the file system policies. Looking at the settings for the existing data directory /var/lib/mysql, you can see that the profile gives read (r) access to the directory itself, and read, write, and lock access (rwk) to its contents recursively (controlled by the **). Conveniently, it also #includes the contents of the local file.

Editing the Policy

To give MySQL the necessary access to the /data directory, I edit the included local file so it looks like the following:

As you can see I haven't been particularly creative; I've just copied the policy that applies to the standard data directory /var/lib/mysql, and copied it to this file, mapping the same settings to the new /data directory. Also, although I've put this in the local version of the policy file, I could just as easily have modified the main policy file.

Conclusion

It's worth pointing out that this technique applies if you want to change where MySQL puts anything on the file system. Although the use case described here is a common first reason to bump up against AppArmor's security policies, the data directory is not the only thing that you might want to move. Logs, the UNIX socket, and even configuration files are subject to the controls placed in the AppArmor policy. This also includes any files you access with anything that uses the FILE privilege, such as SELECT ... INTO OUTFILE, LOAD DATA INFILE, or the LOAD_FILE() function. While you can secure this sort of access with MySQL's secure_file_priv option, AppArmor provides a further layer of security that prevents even currently unknown exploits from accessing parts of the file system that they really, really shouldn't.

Friday Jan 25, 2013

The MySQL Installer is an easy way to install and maintain your MySQL installations on Microsoft Windows.

It comes not only with the database server and documentation, but also with the underrated MySQL Workbench, and various connectors and support utilities such as MySQL for Excel. Modifying the installed products or uninstalling them is also fairly straightforward:

Start -> All Programs -> MySQL -> MySQL Installer -> MySQL Installer

This launches the Maintenance window of the Installer GUI, and gives you the options:

Add/Modify Products and Features

Check for Updates

Remove MySQL Products.

I had a question from someone who wanted to automate a process in an environment where they tear down and reinstall MySQL to test deployments. The GUI-based way to change or remove MySQL is manual and requires human intervention, and they wanted to automate as much as possible, and thereby install and uninstall MySQL repeatedly and automatically, from a script run at the command prompt.

This is quite easy to script if you've got the MSI file you used to install MySQL, as documented here. I thought I'd go a bit further into it and explore the mechanism a little more.

Control Panel and the Registry

Before we get into the how of unattended uninstallations, the Windows Control Panel also lets you uninstall from the Programs and Features applet (which used to be called Add/Remove Programs). For MySQL, this option offers similar options to the Installer.

This mechanism (which isn't specific to MySQL) relies on a centralised repository of change/uninstall commands.

So, after a bit of investigation, I found the location in the registry:

In that location, the UninstallString key contains the following text:

MsiExec.exe /I{26FFE68D-7BD4-472A-9AB8-3517B6BF9E51}

You might recognise part of that statement from the article Automating MySQL Installation linked to above. Running that MsiExec.exe command launches the uninstall GUI, just as if you'd clicked it from Control Panel.

The GUID (the value between brackets) is correct on my machine, but differs from version to version. You can find the
correct value by searching HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\ in the Windows Registry for “MySQL Server” and finding the GUID that the Windows Installer uses for the program.

(Note: The Registry is a dangerous place to play around in, even for
experienced people, so you should be exceptionally careful in the registry and using MsiExec.)

Tweaking MsiExec.exe

Now, to automate it, there are options to MsiExec.exe that help:

/X bypasses the front screen and goes straight to the uninstall process

/quiet disables the GUI entirely and automates the uninstall (if possible, which it is in MySQL's case)

/norestart disables the post-uninstall reboot that is needed in some cases

/L enables logging, with various options (the following example uses /L*V to log in verbose fashion)

Performing an Unattended Uninstall

Bringing all of this together, you can do the following. In a command-line started with the “Run as Administrator” option, type the following (all on one line):