Developer tips

Database Migrations And Java

There’s a trivial problem that every project has – keeping the database structure in sync with the application. That is, when your code relies on a new table or column, that table/column must appear in the database. This is especially a problem when there are multiple environments, on which the code gets deployed (production, stage, integration, local), as usually additional steps need to be performed in order to update the database schema.

The quickest way to achieve the desired effect is to provide SQL upgrade scripts every time the code needs a new database element (or no longer needs one). But that’s tedious, as these sql scripts have to be executed manually on each environment the code gets deployed to. Automating this process has been achieved by a couple of frameworks, so that you don’t need to do it. Here is a list of options for the Java world.

Why do we actually need such a solution? We aim for frequent, one-click deployments to staging and production environments. We can’t do that without an automated way of syncing the database (otherwise there will be missing columns, or obsolete non-null columns will prevent data insertion). And once in a while we may need to revert a release, and we’d have to rollback the database changes accordingly. While you can just drop and recreate the schema while developing locally, you can’t do that in production.

Rails developers will rightly say “We’ve had that for a long time, out of the box”. Indeed, rails migrations seem like a perfect implementation of the idea of keeping the application code in sync with the database schema. But I had to pick a technology for a Java project. Actually, it is a grails project, and even though there appears to be a grails plugin for that, I picked another one – Liquibase (note: the grails plugin is built ontop of liquibase). I’ll explain how to get it working with JPA in a separate post, but it seems to have everything rails offers: human-readable (XML) format for the upgrade, automatic rollback script execution, and also tools that can generate the upgrade scripts based on a database diff.

In fact, I’ve never used a liquibase-like solution for my personal projects. I use hibernate hbm2ddl.auto even in production, and sometimes manually fix stuff. Yes, that’s not a workable solution for a big project and it has risks. Virtually nobody recommends using `hbm2ddl.auto` in production, because it’s not perfect and it may mean losing data or downtime. For those unfamiliar with hibernate, hbm2ddl.auto is a configuration switch that makes hibernate (and ORM framework) create tables and columns that correspond to the entity mappings. In a sense, it is a way to automatically sync the application code with the database structure. And it can do that, because we have the expected database structure in our annotated POJOs (entities), in a structured way that can define exactly what tables, columns and keys we need. But, as I said, it’s not recommended for production (it is recommended to set it to “validate”, so that it can verify whether the database and the code are in sync).

But I’m an idealist. I want something even better than a good migrations tool like liquibase. We have the prerequisites – a structured way for our code to define what’s the schema it requires. That’s the JPA annotations (with some vendor extensions). We just need a tool that works as if it had a manually created sql upgrade script. It needs to:

create tables, columns, indexes, foreign keys

drop tables, columns, indexes, foreign keys

alter tables, columns, indexes, foreign keys

fix the data before/after the structural change

make sure no data is lost

hbm2ddl.auto does only the first, and can partly do the fourth. It will be relatively easy to implement the “drop” and “alter” functionality. How do we make sure no data is lost? Whenever a column or table is dropped, a new temporary table (e.g. prefixed with _hbm_temp) is created that stores the data from that column/table. If it happens that the data should not have been deleted, you can get it back. Otherwise, you can cleanup all the temp tables once in a while. I can’t see any obvious problem with that approach, and it will save a lot of time, effort and mistakes.

While on the topic, I can’t skip the main reason for this complexity – the fact that relational databases have a rigid schema. NoSQL database, on the other hand, are mostly schema-less. It eliminates the exceptions arising from missing columns or tables – they automatically “appear” in the database. However, unused fields stay there and pollute the database, which you should clean up manually. And lacking any database-imposed constraints means that they should be handled in the application code. So, there’s a tradeoff (as usual).

Overall, avoid having to execute sql scripts manually in order to fix the database. Always aim for the application to be the main point, and make everything else (the schema) follow. This can’t happen always (e.g. if you need database triggers and stored procedures, or you have multiple applications using the same database, that can’t share their codebase), but at least put these scripts in the same repository as the code and make it possible to have quick deployments to any environment, with as little manual tasks involved as possible.

PS you advice to avoid executing sql scripts manually (so it’s shell, sql admin, etc.). You mentioned libs that handle migrations, but I didn’t find a tool that directly executes a up/down migration. In rails/symfny there’s a powerful CLI tool handling lots of different tasks, including migrations. How about such tool for java – would you use maven for this?

The Grails plugin uses Liquibase. It exposes all of the Liquibase functionality, but also adds Grails and GORM integration (e.g. the dbm-gorm-diff script). You can use Liquibase XML if you want, or the Groovy DSL – your choice.

Have you heard about OneWebSQL framework (http://onewebsql.com/homepage)? It was designed to keep application in sync with database modifications. It’s not a popular tool but we use it in my company instead of Hibernate and it’s really enjoyable to write applications with it.

For the .NET people out there, EF has also just added migration functionality. If you aren’t using that however, take a look at FluentMigrator. I’m using it for a production app, makes deployments a breeze. It has integration with NAnt so I can easy update the database when I push new app versions out.

Besides, are you working with one database vendor? Our software must support Oracle, Microsoft SQL and Postgres (at least). And I can’t see how to do that with Liquibase (and yes we’re Spring + Hibernate application).

Right now we’re using dbdeploy which isn’t perfect for us, but it works.

I know, but what to do with different database types (e.g. nvarchar for MSSQL, varchar2 for Oracle and varchar for Postgres). To make things even more interesting, it will be nice to reuse hibernate dialect in liquibase xml migration scripts (I know that it has some integration with hibernate, but don’t see a way how to use it in our case).

First off I would like to say awesome blog! I had a quick question in which I’d like
to ask if you don’t mind. I was interested to find out how you center yourself and clear
your thoughts before writing. I have had difficulty clearing my mind in getting
my thoughts out there. I do enjoy writing but it just seems
like the first 10 to 15 minutes are generally wasted simply just trying
to figure out how to begin. Any suggestions or
tips? Kudos!