Converting Rails Applications from MySQL to PostgreSQL

Update (2009/09/14): There is a much better article on doing this conversion here that is specific to migrating Typo from MySQL to PostgreSQL. I will admit that I have not tried using that article’s boolean conversion method.

I recently decided to give PostgreSQL a try after learning about some of the oddities of MySQL, plus it gave me an opportunity to see what was necessary to convert between two RDBMs. I installed PostgreSQL 8.3.7, which is the most recent version currently in the main Gentoo Portage tree, using the Gentoo Wiki guide. Rather than explain the details of setting up and using PostgreSQL, I’ll explain what I did to convert my Redmine and Typo instances. (The following assumes some familiarity with how to use Rails, MySQL, and PostgreSQL command line commands. See their man pages as well for other options that you may want/need to use)

Dumping the Database

At first I tried to use a rake taskto convert my redmine database, but it would fail because ActiveRecord can’t really treat join tables as ActiveRecord types. The PostgreSQL connector code wanted to generate a SQL statement that would return the value id, which ActiveRecord decided was the nonexistent primary key for the table. I was able to modify ActiveRecord to get around this, but the rake task was not loading the repository sub-types either.

Instead, I decided to try and convert it the “old fashioned” way: dumping sql code, editing it, and importing it. Note that I was using MySQL 5.0.70, and the options may not be available in older versions of MySQL. After shutting down all processes that might access the database, run

--no-create-info tells mysqldump to not print schema information. We will use Rails’ own migration functionality instead to recreate the schema in a manner that takes more advantage of PostgreSQL. --compatible=postgresql makes mysqldump produce SQL code with quotes that are more ANSI compliant, and it produces dates and times in a format that PostgreSQL seems to like. Finally, --complete-insert makes the INSERT statements in the dump include column names for the insertions.

Preparing the Dump for Import

If you tried to just import the dump into psql now, psql would report a multitude of errors. What follows are the ones I ran into, and how I got around them.

LOCK and UNLOCK Lines

First off, psql does not like the LOCK and UNLOCK statements in the dump. You can safely remove these lines. I zeroed out their lines in vim with

:%s/^\(UN\)\?LOCK.*$//g

boolean Field Types

The second type of error I received was about boolean fields:

ERROR: column "is_default" is of type boolean but expression is of type integer
LINE 1: ... "enumerations" ("id", "opt", "name", "position", "is_defaul...
^
HINT: You will need to rewrite or cast the expression.

The equivalent fields in MySQL appear to be stored as integer types (1 and 0), but PostgreSQL prefers TRUE and FALSE for its boolean values – although it accepts '1' and '0'. Run

pg_dump -U postgres -s redmine > redmine_schema.sql

and look for the name of all boolean-type fields that would need to be changed. Then, back in the dump, carefully go through and quoted the correct 1’s and 0’s for each inserted row.

Possible Pitfalls

I ran into a confusing snag after I edited and imported my dump. Some serialized fields could not be converted into arrays. After tracking down this problem for a while, I discovered that extra newlines were being introduced to some of the serialized objects’ text fields. It turns out, I accidentally let vim hard wrap lines in the middle of quoted strings in the dump file.

Importing the Dump

Once your rails application is set up to access a PostgreSQL database, set up the new database’s schema:

rake db:schema:load RAILS_ENV="production"

After the schema and migrations have completed (I recommend verifying that the all the tables’ definitions are in place using something like pgAdmin3), it is finally time to import the dump:

psql -U postgres redmine < redminedump.sql

If the changes I mentioned above were made, there will still be one ERROR amongst dozens of warnings:

The schema_migrations table stores the number of each migration run on the database, and this error can be ignored. If you want to avoid this error, you could delete the INSERT lines that cause it from the dump.

Fixing the id Sequences

We are almost, but not yet done. The primary key in most ActiveRecord–managed tables is the “id” column, and each row in a table is supposed to have a unique entry. PosgreSQL uses sequences to increment this value for each new table entry, and the sequences need to be set to at least the highest numbered value they are associated with (a table with no entries can have its sequence be set to the default of 1). I used pgAdmin3 to inspect the contents of each table, and to then set the corresponding sequence’s current value, but this could also be done with some clever SQL code.

Conclusion

There were dozens of WARNINGs about “nonstandard use of escape in a string literal”, but they did not affect psql’s ability to correctly fill in its own tables.

At this point, you should take one last look over the data with some queries or pgAdmin3, looking for anything that might of gone wrong. After that, launch your rails application, and verify that it can still run.