Technical content about Microsoft data technologies. All opinions expressed are purely my own and do not reflect positions of my employer or associates.

One Database or Ten?

In my career to date I’ve worked as a DBA for mainly “buy don’t build” scenarios. One advantage - and hair-graying problem, if I am honest - has been that I think I’ve seen upward of one or two hundred different ISV-provided applications’ SQL Server databases. It’s a great learning opportunity, as I can see successes and failures in a large variety of designs, and sort of “fast-forward” to version five or eight of a given design pattern to see things like whether it scales, or has storage problems, has security flaws or DR issues. Today I want to talk about an often repeated blunder that I’ve seen the whole time: designing around the wrong number of databases in SQL Server.

Implementing an application based on the wrong number of databases, architecturally, can have major consequences. As you can imagine, that type of design issue is very, very hard to untangle in version 2 or 5, after the application is out in the wild. So to me it makes sense to consider very carefully, and get this right if you are lucky enough to be at the beginning of the process, or at a point where a redesign is possible.

The Most Important Consideration: The Log(s)

The transaction log doesn’t seem to get a lot of love in the development process – it’s just that workhorse file that becomes a nuisance when it grows to fill its disk. But here’s the thing: whenever anything goes wrong in SQL Server, from a simple failed row insert to a server crash or DR event, the log is the key thing that saves your bacon. It makes rollback possible, and recovery of the data to a consistent state possible. It makes atomic transactions possible. It makes online full backup possible, and point-in-time recovery. I’d like to repeat that: the log is a vital element in taking a full backup while the database is online. We’ll see why that is so important in a minute.

The most important idea about the log is this: the only way to get a transactionally consistent set of data over time is to have either just one stream of journal (log records) for the data, or to have multiple streams of journal that are robustly tied together, lock step. Because each database in SQL Server has its own, independent log file, the first situation is simple to implement – just use one database – while the second is very, very difficult. Splitting data into more than one database implies multiple log files, which in turn implies an independent log sequence. Having separate transaction log sequences means that, while it’s initially possible to perform a transaction across the two databases consistently and correctly, there’s very little guarantee after that transaction, in the space of backup and restore or disaster recovery, especially out in the real world.

The converse is also true – there’s obviously a penalty for co-mingling unrelated information together in one database with a single stream of log records. Centralizing all activity, unnecessarily, in a single log file carries with it overhead, reduced concurrency and other problems. So how can one make this design decision? It turns out there are some fairly simple criteria one can apply, and, when it comes right down to it, only two scenarios.

Two Possible Mistakes

Fortunately there are just two mistakes that we can make:

Separating data that needs to be transactionally consistent into two databases

Combining data that has no need to be transactionally consistent into one database

Here are some questions to ask about a design to see whether it is helpful or dangerous to split it into multiple databases. First, if you are considering multiple databases, are there any places in the design where the databases must be “lock-step,” time-wise. Example: are there any tables in one database that refer by joins and foreign keys to another database, or to a sequence or identity generated in another database? If so, you may be open to profound risk.

To test this, do the following experiment – even as a thought experiment. Imagine that the system is running full-tilt, the application adding data to both databases. Halt the system. Restore one database from backup to one point in time, let’s say five minutes before the time of the halt. Next, restore the other database to a point in time TEN minutes before the halt. If the design implies that that five minute gap in time will cause problems with your data or application, then you are open to severe risk by storing this data under two separate log sequences, which is implied by using two databases.

Second, if your application uses three-part-name, cross-database queries, what are they for? Do they imply that the data across databases really should be part of the same set, or are they legitimately separate? Is there some give and take, in time sequence, between the data in one database and data in the other(s)? I’ve had ISV’s say things like “failover at the instance level is a requirement.” That would be OK, if it were possible. It’s not, not really. “But we would just restore both databases at/to the same time,” is something I’ve heard as a remedy. That isn’t anywhere near good enough, and here are some technical examples showing why:

Backups are Broken

Full backups are not and cannot be time-consistent across multiple databases. That’s because, in order for full backup to work while a database is online, it incorporates some “slice” of the log file, which is replayed when the database is restored from the backup file, in order to make the data consistent in the database relative to changes that were made during the time the full backup was actually taken. The restore will always be as of a point in time at the end of the time it took to execute the full backup.

It’s impossible to run full backups of two databases and make them end at the same time. Imagine: one day database 1 takes 5 minutes to back up and database 2 takes 25 minutes and 5 seconds to back up. Each day that time varies. Starting those at the same time obviously makes no difference. They would have to be made to end at the same time, every time, which is impossible.

Implication: it’s not ever possible to restore multiple databases from full backups to the same point in time.

One might say, then, that it’s just always a requirement that log backups be used, and any restore manually rolled forward to precisely the same time point. I think that would be fine, for the five percent of organizations who could actually make that happen. Few people actually do that out in the real world. Simple restore from full backup files is basically always an expectation, and while it’s technically possible to use log backup files, there’s real risk in the fact that either people won’t expect to have to do that, or they won’t know how to do that. Why even introduce that risk?

DR is Broken

Crunch: disaster. The app just crashed, the database server is down, and we need to fail over.

At the DR site, we have some log backup files, or streams of database mirroring log records – different ones for each database. For one database, the log sequence goes up to 5 minutes before the crash. For another database, the last log backup was much bigger, and it didn’t copy over the WAN in time, so that one is 15 minutes behind. Or that database was behind in sending mirror log records due to some large transaction.

The DBA, as would be reasonable to expect, applies whatever log records are available and brings the DR site copies of the data online, and, because it’s a crisis and she’s busy, she moves on to the 300 other databases under her area of responsibility. The databases are now 10 minutes apart in time. How’s the application look?

The takeaway here: in order for HA/DR, crash recovery and backup features built in to SQL Server to work simply and reliably, your database design must take into account how the log sequence preserves data integrity and consistency in the data, which is easy in the context of one database, and extremely difficult across multiple databases.

Performance and Common Sense Partitioning

So, what about the other side of this issue? There are, of course, performance and administrative advantages to partitioning data out where it’s safe to do so. One basic principle of scalability is to avoid global resources (think one huge table) in favor of partitioned resources (several smaller tables). Here are some cases where it can be safe, and advantageous, to separate sets of data into distinct databases:

True “reference” information that is loosely coupled to the primary database. I have seen this successfully implemented for “lookup data” that may be aggregated together from other areas of an organization, such as other regions or other organizational functions, that is not coupled to an application database in real time. I have also seen it used effectively for small data marts, data warehouses or reporting databases, where reporting information is copied out of a primary database, asynchronously. If such a database is “behind” the main application database, it doesn’t really matter, and whatever interface exists between the two is elastic enough to accommodate that.

Audit or application error log information. There are applications where audit or application log data is produced in abundance, i.e. crazy huge tables. It may be safe, if business policies permit, to store this data in its own database with a separate log, enabling different and better administrative and retention policies. Simple example: let’s re-index the transactional database, but maybe let’s not tie up the main application’s database log by re-indexing tens of gigs of log or audit data, where fragmentation of that data isn’t really a concern. Secondarily, perhaps the transaction context for real activity in the main database can be different than the transaction context of audit or error logging so, for example, logging can’t block “real” transactions.

Multiple ‘tenants’ using the same application. I have some experience (not a huge amount) with multi-tenant scenarios, and in every case I have seen, it’s been destructive to combine multiple tenants literally into a single database, and much more successful to host the tenants side-by-side, each in their own database. There are some administrative challenges to managing a whole collection of databases, but that can be solved with semi-clever code, tools, and rigor around implementation. The scalability problem of one massive DB is much worse. There are several reasons for this:

The natural partitioning that results from splitting the DBs out (one tenant cares only about her own transactions) seems always to provide better query performance and scalability than co-mingling data from different audiences into the same tables. There is some penalty in procedure cache, because each different database will have distinct query plans, but having the data partitioned, on balance, tends to make the queries simpler and faster. Table and index scans, when required, are automatically restricted to a small set of data that matches the audience, rather than scanning a huge structure and discarding much of the data because it’s not relevant to the client who is performing a query.

Locking and associated concurrency issues can be much simpler, and the damage contained. One tenant locks a table? No problem outside their own world, if they have their own database, and the lock is likely to be much shorter in duration.

Tenants frequently want to take data with them when they come and go, and there’s a major advantage in terms of portability and/or archiving if the database for that tenant is physically independent.

Scale-out to multiple commodity servers clearly is simpler.

Backup and database maintenance across many small databases can be more asynchronous and less impactful than the same for one massive database.

The log sequence(s) for each database can be safely and legitimately independent. This makes them smaller, easier to manage, and helps the HA/DR situation.

Conclusion

If you have the chance to make this choice, if it’s not already too late, choose wisely. Consider that disasters do happen, and be certain to choose a single or multiple database design for the right reasons, and in a way that doesn’t introduce risk for your application.

Great article. I'd say that under your "Backups are Broken", point 4, I don't see that as any more difficult of a process than even a full restore and tlog restore process. You simply add a STOPAT and a time. It might require some documentation or process change, but I do it now with some of our vendor based applications that track changes to file level storage... think Autodesk's Vault which has a database to track changes to CAD files. The actual CAD files are stored on a CIFS share on our SAN. This share is snapshotted on a scheduled basis, and I run the database in full recovery model and 15 minute tlog backups. We have tested DR and simply done a restore of the most current snapshot and then a RESTORE LOG WITH STOPAT at that same time.

I will agree with your overall thesis (if I understand it properly), design should take into account how to properly backup, restore, and plan for DR as simply as possible rather than just what's convenient at the Build phase.

I was wondering if VSS file-level backups of the SQL database files help in this scenario? I suspect it would help up to a point in that it's not the end-time of the file copy but is instead the point in time at which the snapshot finished being made.... The key thing is: is the VSS snapshot done at the level of SQL Server as a whole or is a separate snapshot made for each database? The SQL server log file lists separate entries for each database taken as part of a VSS backup, so I suspect the snapshots are done separately but at least the time difference is more narrow.

The best fix, which for non-DBA system administrators is tricky to understand, is to use STOPAT. But as you said, log shipping, mirroring, etc don't play nice with this either.

We're an ISV and push data out to several databases. We're lucky though in that we've got one big metadata database and then all other databases just store BLOB data that we write once and read many. Thus so long as people restore the metadata database to a point in time earlier than any of the other databases we are fine. THere could be data in the BLOB database that has no metadata pointing to it, which makes some BLOBs an orphan, but the slight waste of a few MB of space isn't anything to worry about considering they've had to rollback to an old database due to some disaster in the first place.

Great post, well-thought-out. There are a couple other reasons I've seen multiple databases used:

Data warehouse staging tables - if you're loading then transforming data in a nightly process that's easy to restart, then it usually makes sense to put this data in a separate database altogether. This avoids changing pages in the main database, which keeps the size of your differential backups down. The key is having an easy-to-restart process - if you lose the database, you just restart the nightly process.

Reporting tables - if you have aggregate tables that are easily regenerated from other tables in the database, then you may want to have a different DR strategy for these. I might just back 'em up once a week to preserve the schema in case the source control system bombs. In the event of disaster, we restore a backup, run a job or a T-SQL script to repopulate the data from the main database, and you're off and running.

One technical note about full backups of multiple databases - they actually can be consistent if you use SAN snapshots. The agent will get SQL Server to freeze IO across all databases on the same LUN and get full backups of all. That only works at snapshot time, though - when it comes to the point-in-time recovery and applying the logs, you still have the same challenges. One SAN vendor recommends snapshots every X minutes to accomplish that goal, but I haven't used that approach myself.

Thanks, Brent - all good points. The SAN snapshot thing works, but I worry about designing an application to that, especially one that would be for sale to a broader audience. It's pretty tough to say to prospective clients, "we decided to use multiple databases, so consistent SAN snapshots for backup are a prerequisite for using our product." :-)