MySQL 4.x:

MySQL powers countless databases and data-driven Web sites. MySQL 4, the latest release of the Open Source database, includes features that put it on par with products from database stalwarts Oracle and Microsoft. Unbelievable? Believe it.

MySQL has a lot in common with the Macintosh: both products grew out of their creators’ early vision and passion to become the great products they are today; both have begun to fill very visible roles in the Open Source world; both have been popping up more and more in corporate settings; and both have managed to generate communities of vocal and loyal supporters — communities that continue to grow and thrive.

But the most important and striking similarity between MySQL and the Mac boils down to the emotional response each product evokes. Few people who have used a Macintosh come away unimpressed — they either love the Mac or hate it. People have a similar reaction to MySQL. And recently, there has been more to love and less to hate about both of them (and with the advent of the Unix-based Mac OS X you can run MySQL on a Mac for the first time).

We first touched on MySQL 4.0 in the September 2002 LAMP Post column (available online at http://www.linux-mag.com/2002-09/lamp_01.html). As this is the beginning of a new year, let’s have a look at some of the great new features coming in MySQL 4.x. By the time you read this, MySQL 4.0 should be stable (or nearly so), and version 4.1 will be available for testing.

You may notice that MySQL 4.x is now available in two flavors: Pro and Max (this distinction was present in the 3.23 series, but the versions were called “3.23″ and “3.23 Max”). The Pro version is built with all the standard components and options enabled. The Max version has virtually every option enabled.

If you’re not sure which flavor is right for you, start with the Pro version. It’s what 99% of MySQL users use. The Max releases are primarily for users with unusual needs or who want to try some of MySQL’s less common and/or experimental features.

Compatibility

Though it’s considered a major upgrade to move from MySQL 3.23 to 4.0 (otherwise 4.0 would be called 3.24), performing the upgrade is quite easy. MySQL 3.23 and MySQL 4.0 use the same on-disk format for all tables, so there’s no need for extended downtime or a massive export and re-import of your data. You can simply install a newer version MySQL in place of your current version.

Having said that, it’s always a good idea to backup your data before an upgrade. You could be the first to encounter a new bug that nobody else has seen. (But you’re probably already backing up your MySQL data regularly, right? Good.)

After upgrading to 4.0, be sure to run the mysql_fix_privilege_tables script from the MySQL scripts directory. mysql_fix_privilege_tables updates the grant tables to include the new privileges introduced in MySQL 4.0.

Moving to MySQL 4.1 will require you to either dump and re-import your data or run various ALTER commands to take advantage of some of its new features. However, if you don’t need those features, your existing tables will work fine. We’ll discuss all of the specifics momentarily.

Modular Architecture

Figure One: MySQL’s modular architecture

Before moving on to the new features of MySQL 4.x, let’s quickly review how MySQL is put together. As shown in Figure One, there are several layers to MySQL:

The top layer supplies basic services like query parsing, networking, authentication, and logging.

Once MySQL has received and parsed a query, the second layer is responsible for analyzing, optimizing, and ultimately executing the query. Queries usually involve fetching data from one or more tables, and applying various SQL functions (such as DATE_FORMAT()) to the result.

Like the Linux kernel’s virtual filesystem (VFS) layer, MySQL implements an abstraction layer between the query engine and the underlying data storage. The storage engine API (shown as the third layer) provides generic routines for accessing data regardless of the underlying table type (BDB, MyISAM, etc.).

Modularity allows each storage engine in MySQL to use its own strategies for storing, locking, and retrieving data. For example, BDB tables provide transactions and page-level locking, while MyISAM tables have no transactions, use table locks, provide full-text search capabilities, and so on. The interface is generic enough that the upper layers in MySQL don’t need to care where the tables exist on disk. In fact, HEAP tables don’t use disk storage at all — they’re stored completely in memory.

InnoDB: Transactions and Row-level Locking

Long time MySQL users may recognize many of the storage engines (formerly called “table handlers”) listed at the bottom of Figure One. However, one of them is a relative newcomer to MySQL. InnoDB (formerly Innobase) was developed and is maintained by Heikki Tuuri of Innobase Oy, the Finnish company that has partnered with MySQL AB to provide commercial support for InnoDB in MySQL.

InnoDB was first released as part of MySQL in mid-2001. For most of MySQL 3.23 development, InnoDB was considered experimental, and was only included in the MySQL-Max binaries. Beginning with version 4.0, InnoDB is a standard component of MySQL.

But just what is InnoDB? Without exaggeration, InnoDB is the single most revolutionary addition to MySQL ever. InnoDB is a storage engine that was modeled after Oracle, and it seeks to provide Oracle-like functionality on many levels. The InnoDB storage engine provides row-level locking, non-locking reads, multiple isolation levels, referential integrity, automatic recovery, and full ACID guarantees. And it does so with impressive performance and seamless integrated with MySQL. In fact, MySQL/InnoDB recently tied Oracle for first place in a widely publicized third party benchmark (for details, see http://www.mysql.com/eweek).

That’s quite an impressive list of features. If you’re quite familiar with database technology, you may be a bit suspicious. InnoDB sounds too good to be true. But InnoDB really does provide all that — and more. But if you’re a little rusty on database theory (and aren’t we all?) and that list didn’t mean a lot to you, here’s what those features actually do:

In a database that supports transactions, queries can be grouped into indivisible units of work which are either applied completely or not applied at all. Applied transactions are committed, and aborted transactions are rolled back. Like most other SQL databases, MySQL uses BEGIN, COMMIT, and ROLLBACK commands to begin, commit, and rollback transactions, respectively.

Isolation Levels. In the traditional view of transactions, all transactions should be isolated from each other. One transaction may not affect another running transaction. However, a number of edge cases and degrees of isolation have developed over the years. A full explanation is beyond the scope of this article, but InnoDB natively supports READ-COMMITTED, REPEATABLE-READ, and SERIALIZABLE.

By locking individual rows (and only when necessary), InnoDB tables provide a high degree of concurrency. That means you can have hundreds or even thousands of clients using a single table without locking each other out unless they’re all trying to change the same data.

In a database that supports referential integrity, you can define relationships between fields in related InnoDB tables to ensure that records cannot be removed from one table if they are still being referenced from another. InnoDB will enforce referential integrity. For example, you can’t accidentally delete a customer (say, in the customer table) who has pending orders (in the orders table).

If MySQL is shutdown abnormally, InnoDB will automatically rollback uncommitted transactions and ensure that committed transactions are applied. Unless you find a bug in InnoDB, a system crash will never corrupt your tables or cause you to lose any data.

Until recently, database snobs didn’t consider MySQL to be anything more than a toy because it didn’t have ACID guarantees. ACID is an acronym for Atomicity, Consistency, Isolation, and Durability — the four critical characteristics of a robust database server. InnoDB covers all of them. Transactions are guaranteed to be atomic, indivisible units of work. InnoDB uses a multi-versioning concurrency control (MVCC) system, much like Oracle or PostgreSQL, that ensures that clients see consistent views of their data, and that running transactions are isolated from each other. InnoDB’s logging ensures that committed transactions can survive system failures.

That all sounds great, but how easy is it to use in practice? Simple. Really. Simple.

Using InnoDB

You can convert an existing table to InnoDB with a simple ALTER TABLE command.

ALTER TABLE mytable TYPE = InnoDB;

The conversion takes a little while MySQL copies the data into InnoDB’s tablespace and builds indices. When the conversion is done, you can use the InnoDB tables just like any other table. Like Oracle, InnoDB uses one or more large files on disk (tablespaces) to store all of its data and indices. MyISAM, on the other hand, uses individual files for each table, and groups databases into subdirectories. InnoDB doesn’t rely on the filesystem to help represent the database structure.

If you’d like to create new tables, simply add a Type = InnoDB to the end of your CREATE TABLE statements:

CREATE TABLE mytable ( … ) Type = InnoDB;

That’s all it takes. If you’re already familiar with Oracle or PostgreSQL, you’ll find that the transition to InnoDB tables is relatively straightforward.

We’ve swept a lot of details under the proverbial carpet. There’s a lot more to be said about InnoDB’s storage and logging mechanisms, memory usage, support for foreign keys, and unique features. Full documentation for InnoDB is available at http://www.innodb.com.

Full-Text Search

Text searches in MySQL have long been a hit or miss proposition. Prefix matches like “every name that starts with art” are easy. Given a query like:

SELECT * FROM mytable WHERE name LIKE “art%”

MySQL can perform the search very quickly if the name column is indexed. MySQL traverses the B-Tree index very quickly to locate all possible matches, finding “art gallery,” “art exhibit,” and so on. However, if the search is a bit less specific, things degrade. For example, try to ask for “every name that contains art.” You’d probably write:

SELECT * FROM mytable WHERE name LIKE “%art%”

In this case, MySQL has a lot more work to do. Because the text is no longer “anchored” to the beginning of the string, MySQL cannot simply walk an index to find all possible matches. Instead, MySQL performs an index scan. It reads every node in the index and performs a search on each one. Since the index is much smaller than the table, this is a lot faster than a full table scale, but it clearly doesn’t scale well.

On the upside, of course, you’d find matches like “regional art services” and “Bart Simpson.” Oops. Maybe you didn’t want to match “Bart.” You could use a complex regular expression and RLIKE instead of LIKE to specify exactly what you mean. But MySQL can’t really optimize that query either — for the same reasons.

Even if MySQL had a way of optimizing such queries, it would still leave room for improvement. Why? Because most of the time, when a query asks for all records where “foo” is in the name field, the query really doesn’t want all the records — it probably wants only the most relevant records. If “foo” occurs three times in one record, it’s probably more relevant than another where the pattern only occurs once at the very end of the string.

Very often, that kind of query is initiated by someone who doesn’t know anything about MySQL at all. The user just types into a Web form and expects it to be as smart as Google. The user may even type more than one word, complicating the task even more. Obviously, a match is more relevant if the words are found in close proximity to each other. But standard SQL has no way of expressing such a query.

Luckily, MySQL provides full-text indexing and full-text search. While not new in 4.0, MySQL’s full-text search capabilities have improved substantially in recent releases.

A full-text index can contain one or more “text” fields (CHAR, VARCHAR, TEXT, etc.) It looks like this:

Given this table definition, MySQL builds a full-text index that covers the headline and story columns. To find all stories that contain “database” in the headline, execute:

SELECT * FROM linux_news WHERE MATCH (headline) AGAINST (‘database’);

And to search the headline and the complete story, simply add the story column to the match list.

SELECT * FROM linux_news WHERE MATCH (headline, story) AGAINST (‘database’);

MySQL automatically returns records sorted by relevance. There’s no need to add an ORDER BY clause to the query. Taking things a step further, you can run full-text searches in boolean mode. This provides even greater control over the matching process. The syntax is quite similar to that used by popular web search engines. For example, this query:

finds all stories that contain the word “database” in their headline and/or body, but excludes those that mention PostgreSQL. Full-text search is case insensitive. See http://www.mysql.com/doc/en/Fulltext_Search.html for more details about full-text search.

Subqueries

One of the biggest hang-ups would-be MySQL users have had about migrating their code is the lack of subqueries. In fact, it has probably been the most frequently recurring topic on the mailing list for the last year or two. (Before that it was, “Does MySQL support transactions?”) Would-be users have been writing queries like the one shown in Listing One for so long that they can’t think of a way to do without them. As of MySQL 4.1, there won’t be any reason to — from that release on, MySQL will handle subqueries just fine.

Listing One: A SQL query using subqueries

SELECT * FROM used_cars
WHERE color = ‘red’ AND model IN
(SELECT DISTINCT(type) FROM crash_data WHERE death_rate < 10)

A subquery is a query within a query. The example shown in Listing One asks MySQL to fetch the list of distinct car types that ranked below 10 (whatever that means) based on crash data. Then, using that list, find all the red, used cars of the same model.

A query like that can be challenging to write without a subquery. With some creativity, you can construct a query that returns all the correct data, but the result set may contain a lot of duplicate information. Of course, you could just execute two discrete queries. First, one to fetch the list of models:

SELECT DISTINCT(model) FROM crash_data
WHERE death_rate < 10

Then, using the results of that query, you can string the models together (separated by commas) to produce the values for the IN clause in the second query:

SELECT * FROM used_cars WHERE color =
‘red’ AND model IN (…)

Letting MySQL do the work for you means that you write less code and fewer queries. You’ll also see results faster because MySQL only sends you the data you need to see.

Multi-Table and Ordered Deletes

MySQL has always had a reputation for being practical. When enough users need a new SQL extension, it generally gets implemented sooner or later. The LIMIT clause is a great example. When you need only the first 20 records from a query, simply add a LIMIT:

SELECT * FROM mytable LIMIT 20

And MySQL does what you’d expect. Want the next 10 records? No problem:

SELECT * FROM mytable LIMIT 19, 10

Yes, MySQL counts rows starting from 0 rather than 1. And the limit can be applied to DELETE and UPDATE queries as well as SELECT.

The latest MySQL enhancements also improve DELETE queries. As of MySQL 4.0, you can apply an ORDER BY as well as a LIMIT clause to a DELETE query. That means it’s easy to tell MySQL, “Delete the 500 oldest, inactive records from the archive table,” without getting the list of records and deleting them individually. Instead, you can simply write:

DELETE FROM archive WHERE status = ‘Inactive’ ORDER BY Time ASC LIMIT 500;

And, again, it does what you expect.

But it gets even better. As of 4.0, you can perform a relational or multi-table delete. It only makes sense that you’d want to use a join expression in the WHERE clause of a DELETE query. After all, in a relational database, it’s common for the information to be spread among several tables (assuming it was properly normalized). Taking things a step further, you can even delete records from multiple related tables in a single query.

For example, to remove all Britney Spears albums from your nicely organized and normalized CD/MP3 collection, you can ask MySQL to simultaneously delete all of the album and track data, using her artist information.

MySQL (itself) has been internationalized for quite some time. The messages produced by MySQL have been translated into over 20 languages. By default, MySQL also provides as many character sets to choose from when starting MySQL. By selecting the character set appropriate for your data, you ensure that MySQL sorts records appropriately in queries that use ORDER BY, and performs string comparisons as appropriate for the character set.

However, if you need to store data in multiple character sets, MySQL 3.23 doesn’t offer much help. You can change the server’s default character set using the –default-character-set option at start-up time, but that’s about it. Starting in version 4.1, you’ll be able to set the default character set on a per database, per table, and per column basis.

To set the default character set on the database “beer” to German, simple execute:

CREATE DATABASE beer DEFAULT CHARACTER SET latin1_de;

To create a table to store Unicode text encoded as UTF-8, specify the character set at the end of the CREATE TABLE statement:

CREATE TABLE unicode_stuff (
…
) CHARACTER SET utf8;

And to create a table to hold data in multiple character sets, specify each field’s character set explicitly. See Listing Two. Notice that each text field uses a different character set.

If you don’t specify a character set, all text columns automatically inherit the table’s character set. Tables inherit the database’s character set, and databases inherit the server’s character set.

After upgrading to MySQL 4.1, you may need to adjust the character sets of databases, tables, and columns. To do so, you can use the appropriate ALTER command. For example, to change a database’s default character set:

ALTER DATABASE beer DEFAULT CHARACTER SET latin1_de;

And to reset a database’s character set back to the server’s default:

ALTER DATABASE beer DEFAULT CHARACTER SET DEFAULT;

Similarly, you can use ALTER TABLE to set a table’s character set:

ALTER TABLE messy_stuff CHARACTER SET big5;

And individual columns can be modified as well. Simply provide the new column definition: