Database Source Control Revisited

A while back I blogged about strategies for putting your database artifacts under source control. At the time, I thought that a schema synchronization approach (implemented by Visual Studio Database project) was the way to go. Among other things, it seemed more DBA friendly.

Well, we gave VS Database project the good old college try. We used it (1) to create a local database for development and automated testing and (2) to generate production deployment scripts.

The results were not great. While we were able to make it work for development/testing needs by automatically creating a local version of our database, scripts for production deployments proved to be a bridge too far. Getting a reliable deployment script consistently turned out to be too difficult.

The end result was that we migrated to a schema migrations approach (pardon the pun).

All or Nothing

The basic problem with Visual Studio Database project (aside from the unruly name) is that it requires perfection before it can be truly useful. Allow me to explain.

We have a fairly large number of interdependent databases. Of those, only a handful are directly relevant to our application and actually change during development. Therefore, it is perfectly sufficient to get only those databases under source control. The rest of them would be nice of course, but not absolutely necessary.

Yet Visual Studio requires all of them to be under source control or else. Every time you build, it attempts to recreate and validate the database schema. Any reference to another database it doesn’t know about causes the build to fail. So, in order to get it to build, you have to either drop those references or put other databases under Visual Studio’s source control as well.

In theory, this requirement should be dealt with by putting all your databases under source control. In practice, that’s just not practical (again, pardon the pun). It takes too long, it’s painful, etc. And so, you take short cuts.

It Triggered A Problem

We took such a short cut with cross database triggers with bad results.

Our database has triggers which write CRUD operations on specific tables to a different database. Among other things, this means that an update to 1 record in such a table results in more than 1 update in the database.

Unfortunately, NHibernate (our ORM) doesn’t like this. It expects a specific number of records to get updated. If it sees more (or less) than that, it freaks out and throws an “Unexpected row count” error (see here for more on this).

Fortunately, there is a simple fix for this: just modify the trigger to not return updated row counts. All you need is for this trigger to be under source control. Simple, right?

Well, putting a trigger that references another database into source control required us to put that other database under source control as well. Which takes time and energy. Time and energy that could be used somewhere else.

We didn’t know about the NHibernate issue when we originally put the database under source control. At the time, dropping these triggers from source control didn’t seem like a bad idea. The trade-off was justifiable.

And so we did. Our database compiled and our app worked fine. Development continued on, oblivious to this issue until pretty late in the testing cycle.

Now, I realize that this wouldn’t have happened if we did things the “right way” instead of taking short cuts. And while that may well be true, by the time we got to these triggers, we had already spent lots of time trying to wrestle at least the main databases under source control. Spending yet more time on databases we didn’t even need was not an appealing proposition.

Which brings me back to my original point. If the tool you use makes achieving usefulness too onerous, then perhaps it’s not the right tool.

In that sense, database migrations is much more forgiving. As long as you can generate a baseline schema that will compile in the database, it doesn’t care about anything else. Your references don’t have to actually exist, etc.

Again, this is by no means a perfect solution and it will probably not expose certain types of problems automatically. However, at least you can move forward. And so, we did.

This post got 2 comments so far. Care to add yours?

I’m biased, but I agree that a migration-style approach is a more effective way to deploy databases.

I had a similar experience with cross-database interdependencies with a BPM tool implementation at my last employer: we had stored procs and views in our in-house Contract Management system that needed to retrieve data from its tables, which were managed by the BPM tool. So VSDB obviously wasn’t a good fit.

Another reason migrations can be better is in the area of refactoring: VSDB projects can only do a State A to State B type transition for each deployment event, whereas in migrations you can go State A to Z if you want. For example say you’re splitting a column called “FullName” into “FirstName” and “LastName”, and then remove “FullName”. With VSDB you’d need to split that into 2 separate releases:

Release 1: Add new Columns to Table definition, populate new Columns with data from old column (e.g. using a post-deployment script).
Release 2: Remove the old Column.

If you tried to do it in one release you’d drop the column before you had a chance to migrate the data.

Realistically what you’d probably resort to is writing a script outside your VSDB project that would do it all in one go and asking your DBA nicely to run it for you.

Of course these examples are edge cases, but as they say if you have enough edge cases you’ll eventually get boxed in!