The Database Shuffle

As I mentioned in my previous post, I recently moved from my old home in the SQL Server engine to the StreamInsight team. As I continue to ramp up here, I want to take a few posts to talk about what I was working on back in the engine: the contained database.

To illustrate the problem contained databases are intended to solve, let’s consider a simple scenario: I take a database D that lives on server S, detach it, and attach it to server T:

This is one example of database movement, but there are many movement scenarios—recovery, upgrade, load balancing, consolidation, moves to production—and many movement techniques—backup/restore, detach/attach, mirroring. But how well do these techniques satisfy the scenarios?

In a word: poorly. As an illustrative example, let’s take a look at what happens with authentication.

SQL Server security is a little complex, but for our purposes it’s enough to consider two core artifacts: logins and users. Logins are instance-level objects (stored in master) and users are database-level objects (stored in the user database). Each of these are responsible for authorization in their respective domains, i.e., used to grant permissions at the instance and at the database respectively. But only logins are used for authentication. So in order to even log on to the server you have to have a login.

Users and logins are associated with each other through a matching identifier called a SID, and in order for a person to connect to and use a database, he must have a user in the target database and matching login on the instance. And here’s where our problem comes from: while users are stored in and move with the database, logins are not and do not. They’re left behind:

The result of this is that someone who could connect to D when it lived on S may no longer be able to do so when it moves to T because their login is missing. This is reparable, of course: you just need to manually add the required logins to the new instance. It’s also not easily automatable, since the target instance may already have a different login with the same name, which would cause a collision.

Were this only a problem for logins, we’d have one nit to work around. But of course it isn’t. The same applies to any other server-scoped object on which the database depends: agent jobs, error messages, linked servers, etc.

I’m going to blow my punch line here: we solved this the authentication problem with contained databases. But before moving on to the details, I’m going to touch on some more subtle problems. In particular, collations. Stay tuned.