I learned this kind of migration is not really harmless the hard way: during a production deploy. Since the column was not being referred anywhere in the code, I assumed it was a migration that I could just run at will.

But as soon as rake db:migrate was done, errors started to pop up at the logs:

PGError: ERROR: column "notes" does not exist

Turns out that ActiveRecord caches table columns, and uses this cache to build INSERT statements. Even if the code is not touching that column, ActiveRecord will still attempt to set it to NULL when saving models.

I was just starting to realize how delicate database migrations are.

My first reaction was to call for a maintenance window whenever we had a migration to deploy. But that practice quickly became unfeasible as it blocked deploys, left users unhappy and just made it evident that we were doing things wrong.

It was time to understand the problem, and fix it for good.

Hot Compatibility

So here's the basic principle that allows you to avoid downtime: any migration being deployed should be compatible with the code that is already running.

In order to do so, you'll usually split your deploy process in two steps:

Make the code compatible with the migration you need to run

Run the migration, and remove any code written specifically for it

Going back to our example: if you want to remove a column, you'll need to deploy a patch telling ActiveRecord to ignore it first. Only then you can deploy the migration, and clean up that patch.

With that in mind, lets understand what are the different patches that will make your code ready for a migration.

Patterns

Keep in mind some of the patterns I'll introduce here are for Postgres only.

Read-only models

Most of the issues coming from migrations happen when you're writing to the database. If you don't need to save your model make it explicitly read-only:

NOT NULL constraint

Then update all existing records that have it set to null, and only then you're safe to add the constraint.

Creating indexes

Creating indexes on a live system is surprisingly unsafe: ActiveRecord doesn't create indexes concurrently, so your table will be locked against writes. If you're writing a lot of data to the table, or if there's a lot of data to be indexed, you probably want to create it concurrently instead.

The catch is that you can't create concurrent indexes from a transaction, and all Rails migrations run within one. So you'll need to resort to a hack and create your index using raw SQL:

The good news is that Rails will be able to dump that index to a Ruby schema normally (despite the raw SQL).

Cheat sheet

Adding columns

Safe for readonly models

Safe when there are no constraints on the column

Removing columns

Safe for readonly models

Tell AR to ignore the column first

Renaming columns

Not safe

First add a new column, then remove the old one

When the column is used on SQL queries you'll need to split this in three steps

Creating tables

Safe

Removing tables

Safe

Creating indexes

Safe only for readonly models

Otherwise make sure you create indexes concurrently

Removing indexes

Safe

Future

Running migrations with no downtime takes a lot of planning, and work. But programmers are good exactly at abstracting work and turning repetitive tasks like this into something that can be reused. So it seems like we'll naturally see a lot of the work described above abstracted on a level below the application.

ActiveRecord, for instance, could be more resilient to migrations. A naive approach to resolve the problem of dropping columns would be to rescue the database exception saying the column doesn't exist, remove it from the cache and retry. It's hard to do this in a reliable and clean way, but it seems possible.

Going further, as we move from monolithic applications running on a single server to distributed systems, the need for a database that can elegantly support migrations gets much higher. That's certainly part of the motivation behind the NoSQL movement - but I'd expect change in relational databases too. Ideally they would adapt to this new ecosystem by providing tools to make our lives easier, like the ability to alias a column.

But enough speculating.

The reality today is that hot compatibility needs to be addressed on the application level, and that's the best way to avoid maintenance windows or serving errors to your users.