Monday, 3 December 2007

mysql dumps are not text files

Today's pearl of wisdom:

MySQL dump files, as produced by mysqldump, are not text files.

A common mistake is to think that mysqldump files are text files. While they superficially look like text files, they in fact are not. This leads to confusion and problems if people try to edit them in a text editor application, which will in many cases cause corruption.

MySQL dump files contain the SQL commands required to recreate the contents of a database (or subset thereof). However, these are SQL commands in arbitrary (binary) encoding and can consist of a mixture of different encodings (e.g. if you have BLOBs). This mixture is not safe to edit with a text editor, which expects a text file to contain character data in exactly one encoding (and will typically guess, ask the user or just use its default to determine which).

Suppose you use utf8 data in your database - then your mysqldump file will probably contains all of its strings in utf8. This is fine, until you get to a BLOB. This is a binary lump of data and will be full of sequences of bytes which are NOT LEGAL in utf8, therefore any editor attempting to make sense of them in utf8 won't succeed. It will therefore be forced to change them into something else (i.e. different from the original BLOB data) in order to load them into its buffers.

Saving this modified (or even unmodified) file back out will result in loss of data.