Best Practices for Database Schema Design

Iron Speed Designer relies on the table relationships in order to determine how best to provide the quickest and most efficient experience.

Editorial Note

This article is in the Product Showcase section for our sponsors at CodeProject. These reviews are intended to provide you with information on products and services that we consider useful and of value to developers.

Introduction

When designing your application’s database, it is very important to get the schema correct and follow an agreed upon naming convention, regardless of the size of the application. This helps everyone instinctively learn the schema and purpose of every table and field, even if they are not familiar with all aspects of the database. Getting the schema design correct the first time will also reduce (or eliminate) any changes required to the data access code later in the project’s lifespan.

Code Generation

Getting the database schema design correct will save you a lot of time if you are using code generators, such as Iron Speed Designer, to create the underlying data access and business layer code in addition to the front-end graphical user interface, whether your application is web-based, SharePoint-based, or even a Windows form application.

Iron Speed Designer relies on the table relationships in order to determine how best to provide the quickest and most efficient experience. What I mean by that is that when creating pages with bound data you tend to find on the whole that you want to see data that is related to some other data on the page. For example, when creating a SalesOrder page you will probably want to see related data, the obvious one being SalesOrderDetails. Not so obvious would be the customer this order is for and the status of the order. By linking these in the database, Iron Speed Designer will know how best to create the page layout for you using the wizard, it will use dropdown lists of information that saves you a lot of work. If you see what you consider a foreign key item (a link to another table) that is rendered as a text box rather than a dropdown it’s a fair bet you do not have a relationship set up in the database. Go back and create the relationship before trying to move on with Iron Speed Designer is our advice. You can of course create a virtual foreign key relationship in your code generator, a relationship that exists only in the application; no changes are made to your database schema.

Design Practices

Some useful design practices we have come across and use at Dot Net Architect are:

Use well defined names for your tables and fields, i.e. SalesOrder, not order.

Use the singular, i.e. SalesOrder not SalesOrders.

Use leading capitals for name, i.e. SalesOrder not salesorder.

Avoid spaces and other punctuation, i.e. SalesOrder not [sales order] or [tblSales order]. This avoids having to put [] around the name all the time.

Use a single field primary key (PK). Even though you may have other fields that could compound to give you the PK, we find it better not to.

Use the name of the table appended with ID as the name of the PK field where possible, i.e. SalesOrderID.

Use bit fields to store Boolean yes/no values and start the name with Is, i.e. IsEnabled.

Add audit fields so you know who created and when and who edited and when.

Add a sort order field so you can order the items, great for when alphanumeric is not good enough.

Use the database diagrams to visualise your schema. This makes creating relationships a breeze.

Use indexes to speed up queries, with use of the Profile Analyser to make suggestions.

Use unique constraints/indexes. Avoids having duplicate data in the database where it shouldn't be.

Use GUID's (unique identifiers) for key fields. There are many reasons for this, which can be debated at length. We find they work, are globally unique, more secure than integers and avoid errors when all other PKs start at 1 and increment by 1.

Add a DisplayText field that is a computed column that you can define as required. We point them to other fields in the table and always use this field as the display field for dropdowns etc. If we need to change it later, we just change the formula in the computed field in SQL Server.

It is not always possible to delete records, instead use an IsDeleted bit field to mark the record as deleted. Ensure all data is filtered by this.

We also add a couple of extra fields where needed, IsHidden and IsEnabled, that allow the record to be filtered by the admin and the user, i.e. as an admin, I don't want a user to login any more I set the IsEnabled to false, if later want them to login, I simply flip the flag. The IsHidden can be used to filter items that appear in dropdown lists, for example, without deleting it.

Add an autoincrement field, but not for use as the PK. This can be useful when creating invoices, for example, and you need a visual number for the user.

Try to offload any image fields (for documents or images you want to store in the db) in a separate database or table and just have a pointer to them. That way you can reduce server load as you don't always want the image data coming back when selecting data. You can also backup or image the database separately. (We have an existing client that has over 40GB of data in a database, 35GB of that is PDF and word documents. If that was in a separate database, downloading the main database would take only 10 minutes not 10 hours!)

Use triggers only when needed and insure you really understand them.

Use views, we use them a lot with when we just want to show read only data with multiple joins.

Use functions for repeated code, i.e. we have a suite of functions that return 1st day of month, 1st day of quarter, 1st day of year, etc.

Comments and Discussions

Use the name of the table appended with ID as the name of the PK field where possible, i.e. SalesOrderID.

This should be presented as personal preference. It is often difficult to know what is the PK of a table, so, for me, having an Id column on each table makes things easier to understand.

Use GUIDs (unique identifiers) for key fields. There are many reasons forthis, which can be debated at length. We find they work, are globally unique, more secure than integers and avoid errors when all other PKs start at 1 and increment by 1.

DON'T! Using GUIDs made a hell out of our database when we hit a few millon rows. Guids are random and insertion provides infinite fragmentation. If you have no choice, make this a surrogate key (non clustered index) and defrag frequently. Simple integer identity columns makes the trick 99% of the time and avoids the need for an "order" column.

Use nvarchar fields to ensure all languages are catered for.

Storage is cheap, isn't it? Unicode takes double space. For big tables, avoid nvarchar unless you know that you are sure that sometime you will need to support asian, islamic, or other non-latin languages. English and latin dont require unicode. Only use unicode columns for end-user typed fields like 'LastName' or 'Comments'. I have seen people use it for IPAddresses or other text fields where simple text would be enough. Keep your data small, small is fast.

Yes, but avoud Varchar(max) also. Max Sized types are stored at a separate data structure and avoid the 8k page size limit of sql server, but they also disables the capability of online defragmentation (i.e. with no downtime).

If you plan to grow, always try to use the smaller data type possible. Varchar(255) will be more than enough for an email address, a full name, etc...

After seeing and making plenty of mistakes over the decades, I agree with what the author Dean says. Very good write up.

1. ID is a standard that works extremely well. Just "ID" gets confusing when joining multiple tables and with inherited tables. For such an important column, spell it out.

2. GUIDs are a MUST for lookup tables. For transactional data (i.e. your SalesOrder table), integers are fine, but I still shy away from Int. But lookup tables (i.e. Language/Country/State tables)...use GUIDs every time!
* GUIDs allow programmers/report writers/etc hard-code TableID values for static data that should never change.
* Inserting an empty record just to know your ID is ridiculous. I see it done way too many times.
* Whereas GUID values can be created in code or t-sql at runtime...and you know the record ID immediately without inserting or calling your database at all.
* Try using NewSequentialID() for transactional data. Can index well, no fragmentation, etc.

3. Use nvarchar(), for sure. Yes, storage is cheap....Way cheaper than the person hours & downtime required for reworking non-Unicode columns in a 100 million record table.
* Size causing performance problems...That is what table partitioning, file groups, multiple hard drives are for.

4. nvarchar(Value) vs. nvarchar(max). Have to agree there. I go smaller, as no screen, no report, no address label, no printed ID card can ever fit (max). I only use varbinary(max) for images & blobs.

1. I still prefer ID as the name of main column. Its shorter for all cases, even when you have to join, and its simpler and still very clear to type Table1.Id=Table2.Table1Id, instead of Table1.Table1Id=Table2.Table1Id. For multiple, nested joins, you are probably still using table aliases.
For inheritance, my choice is using 'Id' in the parent table and 'ParentId' in the child, so the joins are Parent.Id=Child.ParentId.
Sorry, but I still don't see where is the Table1Id naming better than Table1.Id.

2. I agree with you here for lookup tables, but don´t for for every key field as the author suggested.