Working with hierarchical data in SQL Server databasesLast
updated: October 29th '05 |Best viewed with: All popular
browsers | Best viewed at:1024x768 |Links to external
sites will open in a new window

Note: Information & code samples from this article are tested on SQL Server 2005 RTM (Yukon) and found to be working. Will update the article in case of any compatibility issues.

In this article, I will show you a simple example that traverses a hierarchy and displays the output in indented format. I will also provide you with links and pointers for more advanced and in-depth information on processing hierarchies.

Representing data in the form of a hierarchy or a tree is a common requirement. The most common example I can think of, is an organizational chart that contains all the employees in a given organization and each employee is linked to his/her manager by his/her manager's ID. Since managers are also employees, their details also get stored in the same employees table.

A few more examples:

- Implementation of a permissions hierarchy.

- Implementation of a product catalog for an online store.

All the above examples need data to be stored in the form of hierarchies or trees (also loosely referred to as parent child relationships). But SQL Server is a relational database, not a hierarchical database. So, we have to store the data in normalized, relational tables, but come up with programming techniques to process this data in a hierarchical manner.

Unfortunately, there is no built-in support for processing hierarchies and tree structures in SQL Server's T-SQL implementation. (Oracle's PL/SQL implementation has a CONNECT BY syntax that makes it easier to work with hierarchical data. Let's hope we will see something similar in the next version of SQL Server, Yukon).

C programmers use recursive programming techniques for traversing tree structures. The same can be implemented in T-SQL using recursive stored procedure calls. I will demonstrate this in the following example.

Consider the employee table of an organization, that stores all the employee records. Each employee is linked to his/her manager by a manger ID. This table can be represented using the following CREATE TABLE statement:

Notice that, EmpID is decalred as a primary key, and the MgrID column is declared as a foreign key constraint, that references the EmpID column of the same table, that is, a self referencing table. This is so, because all employees and managers are stored in the same table.

Since EmpID is declared as a primary key, by default it will be implemented as a unique clustered index. Now, let's create a non-clustered index on MgrID column, to improve the query performance:

Notice that the MgrID of President is NULL, that means, he is the super boss and has nobody managing him. As you can see, rest of the employees are linked to their respective managers using the MgrID column.

Now, let's create a stored procedure, that traverses this employee hierarchy recursively and displays the employees in the form of an indented tree structure. The following stored procedure has an input parameter named Root, that takes the ID of the employee (equivalent to the ID of a node in a tree) and displays all the employees managed by him and his sub-ordinates.

While creating the above stored procedure, you will receive the following warning:

Cannot add rows to sysdepends for the current stored procedure because it depends on the missing object 'ShowHierarchy'. The stored procedure will still be created.

Nothing to worry about. It's a just a warning, as SQL Server is a little confused about the recursive stored procedures.

As you can see, the above stored procedure calls itself recursively. You should be aware of a limitation imposed by SQL Server though. A stored procedure can nest itself upto a maximum of 32 levels. If you exceed this limit, you will receive the following error:

The above is just a simple example of traversing hierarchies. There's more to hierarchies than just traversing, that is, adding to, deleting from and modifying hierarchies. The following links deal with hierarchies in greater detail and propose different methodologies for managing hierarchies efficiently.

Joe Celko proposed the "Nested set model" for representing trees in SQL. He is one of those people, involved in the ANSI SQL standard. Apart from books, he wrote about this nested set model in magazine columns and newsgroups as well. Here are some links that contain good explanation and code samples:

RAC is a native utility for MS SQL Server, designed exclusively for data manipulation. RAC uses a common framework that can be manipulated to summarize data, perform pivoting operations including crosstabulations, character operations including string replacement, splitting, concatenation and processing hierarcharical data.