February 9, 2012

It seems like Microsoft has always provided example databases in order to illustrate database concepts.

Pubs: Back when SQL Server 2000 was new, Microsoft provided this sample database and maybe you remember it. Pubs was a really simple database that was designed to represent data that a fictional book publishing company might use. The database had a name (pubs) but I don’t think anyone ever gave the company a name. From here we’re told that it was “used to demonstrate many of the options available” for SQL Server.

AdventureWorks: The Adventureworks database came along with SQL Server 2005 and we were introduced to the bicycle manufacturer Adventure Works Cycle. The database was pretty comprehensive and it touched on a lot more features of SQL Server. I’ve seen countless performance demos that query Sales.SalesOrderDetail. There’s even an Adventureworks Data Warehouse, an Adventureworks OLAP cube and even a light version: AdventureworksLT.

Contoso And lately there’s a new sample database Contoso. It’s another fictional retail company (Contoso Ltd. This time they’re selling electronics and appliances). This database is used for Business Intelligence demos. With Contoso, Microsoft has developed and made a data warehouse and an SSAS cube available.

Presumably, each of these databases is an example of Microsoft putting their best foot forward. Or at least it’s a database which enables them to show off what their database software can do.

Beyond Demos

So those databases are good for demos.

But if we’re looking for other Microsoft-designed databases, we don’t have to stop there. There’s other ones that we can look at. I’m talking about the databases that Microsoft ships along with their products. When looking at these, remember:

There are some good examples, …
We can look at those databases and see how they’re constructed. If you see some interesting db designs, you can use that.

… and there are some not so good examples …
These databases weren’t built as database role models. So there’s bound to be a lot of database design choices that are not ideal. I know that Microsoft has huge pressure to maintain backwards compatibility.

… and that’s okay.
It’s actually a fun exercise to pick out the good designs from the bad. It’s like a smorgasbord, you take the lessons you like and leave the rest.

Some examples of what I’m talking about.

ReportServer

Install and configure an instance of SQL Server Reporting Services. Take a look at the ReportServer. It’s a pretty nice database schema. One thing that was interesting was the choice of identity column data types:

Catalog seems to be one of the main tables here and they’re using uniqueidentifiers (GUIDs) for identity columns. Normally people scoff at that, but I think it makes sense in this case. Catalog is the table that tracks reports, datasources and folder structures. Even at it’s largest, it can’t be that big (I just checked my own installation and the clustered index fits on 1 extent). That means that all the worries about GUIDs for primary keys are mitigated and the choice of using a globally unique identifier might make sense here.

ExecutionLogStorage Contrast that with ReportServer’s largest table, ExecutionLogStorage. Depending on how busy your Reporting Services installation is, this table can grow very large with one row for every report view. When I look at the choice for their primary key, I see that it’s clustered index on a single BIGINT column defined as it’s IDENTITY column. Well done.

Any Database’s Metadata

I like the object oriented influence here:

The relationship between sys.objects –> sys.tables and between sys.objects –> sys.procedures is the subtype relationship class –> subclass. And that shows in the schema, for example, there’s one sys.object row for every sys.table row.

The relationship between sys.foreign_keys –> and sys.foreign_key_columns is the object composition relationship. And that also shows in the schema too. The “primary key” for the table foreign_key_columns is (constraint_object_id, constraint_column_id) and it contains the primary key of its owner.

Another good example is the sys.index_columns table. Look at the schema diagram below. It’s a perfect instance of my favourite DB Design example

Any Sharepoint database

[Sigh] These are mostly counter-examples here, I took a brief glance at the schema once and decided quickly that there are no good lessons. Can anyone tell me if that’s changed?

There are actually lots of good reasons that SharePoint DB’s look the way they do. I took a set of database designers – walked them through an exercise of designing a database that does what SharePoint does (without telling them that) and had them design a database. It looked remarkably like – SharePoint! We can do that exercise sometime over a nice coffee if you like. 🙂

I’m sure there are some really good reasons for those design decisions. (And now I’m suddenly curious about them).

Now that I think of it, that exercise sounds familiar. I was working on a database schema and didn’t like it, so I decided to do a quick proof of concept for a replacement design (What would the design look like if I could start from scratch). After working through some challenges the database design had to support, I ended up with a schema that was very very close to the existing schema. Just the names were different.