MySQL: Better Practices

George Reese has an article up on O'Reilly's OnLAMP titled "Ten MySQL Best Practices." I have a few problems with it, however. So I'll take time to detail them here in the hopes that others might think about these issues too.

First off, I'm amazed that anyone still uses the phrase "best practices" anymore. Think about that phrase. The word "best" implies that the author is smarter than everyone else on this particular topic. How arrogant.

Network Security

Anyway, on to the content. His #2 best practice (see how dumb it sounds? Why can't they be called recommendations? Or lessons? Or...) is "Hide MySQL from the Internet"

MySQL has a pretty solid track record for security of a network service. Nevertheless, there simply is no good reason to expose MySQL directly to the Internet-- so don't do it. When you hide MySQL behind a firewall and enable communication to the server for only hosts running application servers and Web servers, you constrain the path of attack a would-be hacker might take.

But he completely fails to mention that 95% of users could benefit from taking advantate of MySQL's skip-networking option. With it enabled, MySQL doesn't listen to any TCP ports AT ALL. It only listens for local unix socket connections. Given that most MySQL users are running PHP and Apache and MySQL on the same machine, there's little need to leave it open on the network at all.

I'm not saying that this is the solution to the problem in all cases, but he's clearly not aiming at the more sophisticated users anyway. Otherwise he wouldn't have had to tell them to use passwords in his #1 best practice.

Note, also, that he misused the term "hacker."

Binary Data

The #4 "best practice" he lists is "Don't store binary data in MySQL." Instead, he advocates using the filesystem to store binary data. I'm so sick of hearing that argument. A lot of people do store binary data in MySQL and it works just fine thank you very much. In fact, I've found a very useful technique when I need to store large amounts of text (non-binary data). I compress it using Perl's Compress::Zlib and store it in a MySQL blob field. Why? It saves space (and therefore disk seeks on queries). Having the data in MySQL means that I don't have to worry about it being replicated on all the servers it needs to be. I let MySQL do it's job and I get on with mine.

He's likely assuming that the application that needs the data will be running on the same server as MySQL. Either that, or he's saying that you should fragment your data store, putting some of it on one machine (in MySQL) and some on another (in the filesystem). I think that's a decision one should not make lightly.

ANSI SQL Only?

Ah, #5 is one that really gets me. It is called "Stick to ANSI SQL" and says this:

MySQL provides many convenient additions to the ANSI standard that are very tempting for programmers. These additions include timesaving tools like multitable deletes and multirow inserts. When you rely on these features in a MySQL application, you limit the ability to adapt the application to any other database engine. In fact, you may make it impossible to port the application to another database without a significant rewrite. For maximal portability, you should therefore stick to ANSI SQL for your applications.

Oh, please.

One of the big reasons to use MySQL is that it's fast. Damned fast. By taking advantage of it's features, you can save a heck of a lot of time in developing your application. The reality is that people don't often move products or projects from one database server to another. Why not? Because they're all different anyway. Yes, they all support ANSI SQL to some degree or another, but if you limit yourself to the least common denominator you're going to waste a lot of development time.

In fact, when Rasmus Lerdorf (the father of PHP) was at Yahoo! a few weeks ago to talk about PHP, he mentioned what a dumb idea most database abstraction libraries are. Why? Because the lowest common denominator across SQL databases is pretty damned low. And I tend to agree with him.

Not using any of MySQL's special features is like programming in Perl and not using Regular Expressions (just because every language does them differently). It doesn't make a lot of sense.

Sequence Numbers

It gets worse. In #6 "Create your own sequence generation scheme" he argues against using AUTO_INCREMENT fields for a few reasons. Let's examine his reasons.

He starts with "you can only have one AUTO_INCREMENT column per table." So what? That's a well documented limitation. It's not a reason to avoid them, it's just something to keep in mind when developing. That's like arguing against using VARCHAR fields because they can hold "only" 255 bytes of data. That's what they were designed to do.

Then he says:

You cannot have a unique sequence for multiple tables. For example, you cannot use AUTO_INCREMENT to guarantee uniqueness for columns in separate tables so that a unique value in one table does not appear in the other table as well.

But he fails to mention how rarely that is a problem. I'd like to see the database design that relies on having unique numbers across tables. If you do need that, then take advantage of some other method. But that's rather uncommon.

Finally, he says "You cannot easily determine from an application what values MySQL has automatically generated." Why would you need to do that? If you code the application properly, that's not going to be an issue. Remember, this is an AUTO_INCREMENT column. It's supposed to be generated AUTOmatically. If your code is generating the values, don't tell MySQL that it should be an AUTO_INCREMENT column. It's as simple as that.