Introduction

I've always liked tree structures. I have worked with them in many situations, and I have searched efficient ways to manage them in SQL Server. As the limit of 32 nested levels in the SQL Server (both 2000 and 2005 versions) impedes us to attack this issue recursively, we have to allot an overhead in storing and accessing nested structures in database tables. In this article, I will try to explain how to efficiently store a tree structure in a SQL Server table, access it easily from T-SQL and clients, and transform it in to a structure recognized by OLAP Dimensions in Microsoft Analysis Services.

The question is how to efficiently and easily store and access tree structures, and how to transform them as in the pictures below:

Tree structure storage

The well-known method to store a tree in a relational table is to have a link between a node and its parent. So, the ID and P_ID fields are enough for this approach, where P_ID is NULL for root nodes. The access method for this structure could be: storing the absolute path (starting from the root) for a node, storing the left ID and right ID in a different table if the structure is a binary tree, storing once again the ID and P_ID fields values and a LEVEL value in another table, storing left and right index values for every node in the same table etc. In my opinion, the most efficient way is the last one, because the access will be very easy using just a simple SELECT statement and, for large structures, the storage size is minimum. More information about trees can be obtained from the CodeProject article, General trees persisted in relational databases, and the MSDN technical article "Hierarchical Data and Scope Checking".

Using the last technique, we will have two supplementary columns in the table: NLEFT, storing values for the left index, and NRIGHT for right index. We have to carefully set the appropriate values for these columns, because the "left" value for a node must always be between the "left" value and the "right" value of its ancestors. Having these conditions, a simple self join for the table with a filter for a node will return the entire hierarchy:

I have added the NLEVEL column to store the level of the nodes useful in further needs. The level starts from 1 for the root.

The access will not be so difficult, but how can we set the appropriate values for NLEFT and NRIGHT? I have chosen the trigger implementation, to be sure that every time a node is inserted, the hierarchical structure is stored correctly. We could update NLEFT and NRIGHT values on deletion (even that is not necessary), and if the P_ID field value is changed.

I present only the trigger for insert action, but it can be done in the same manner for all update P_IDs and delete as well. The trigger checks if the new inserted node is a root or a descendant node. If it is a root, it generates a new NLEFT value (it allocates a maximum of 10000 values per hierarchy) and sets the NLEVEL to 1. If it is a child node, set values depending on if the node is the first (there are no other children on the same level) or last (there are other children for corresponding levels and the parent) child added:

After the values are stored, the trigger ensures that the corresponding ancestors have appropriate NLEFT and NRIGHT values. This is done from the root to the leaves, in Depth-First Traversal. First, it identifies the root node, and then it applies the recalculation algorithm:

move down the hierarchy, and find the first child (from left to right) of the topmost parent, and give it a NLEFT value of 2;

continue down the hierarchy, setting the NLEFT value with last NLEFT + 1, and if a leaf node is found, count its NLEFT and NRIGHT values (NRIGHT = NLEFT + 1);

continue this numbering for all next possible child nodes at the current level of the hierarchy;

after all leaf children nodes are numbered, go back to their parent, and set its NRIGHT values with the maximum NRIGHT children value + 1;

continue numbering the siblings for the current node;

continue walking the hierarchy until all nodes are numbered;

The algorithm is implemented in two manners: recursive and iterative.

The recursive method is simple, and it is found in the TREE_RECALC_REC stored procedure. The stored procedure checks if the node currently processed has children. If it has, apply itself for all the children, setting the @NRIGHT output parameter with the corresponding value. If the node has no children (it is a leaf node), simply set the NLEFT and NRIGHT values:

As it is known, the 32 nested levels limit forces us to not store more than 32 levels on a hierarchy.

The iterative manner solves the 32 nested levels limit issue. The workaround is to use a "stack" as it is explained in the Expanding Hierarchies MSDN article. The "stack" is a temporary table which stores all the nodes which belong to the same parent, starting with the root. The nodes are processed in a loop using a level variable, and when there is no node with the current level in the "stack", the level is decreased with 1 (the previous parent level is processed). The level variable takes values form 1 to MAX(hierarchy level). The same algorithm may be applied for a child when you want to retrieve its ancestors, but the level variable will start from the child level to 1. When the last leaf node in a sub-hierarchy (the node with the greatest ID value on its level and parent) is processed, the numbering algorithm should be applied over the ancestors. For this, we will need another "stack" table which allows numbering from child to ancestors. The implementation takes much overhead indeed, but the nested levels are limitless. The stored procedure TREE_RECALC_ITER contains the iterative recalculation algorithm:

The last XML output is not so useful for the clients. They probably need the underlying tree structure which SQL Server XML features can't obtain, and not this "fake" ID-parent ID implementation. So, the next idea will be to re-arrange the "fake" tree structure in a XML nested hierarchy. This will be done using the TREE_GET_XML stored procedure, an XML web service, and a general XSL transform stylesheet. Using id and p_id attributes, the stylesheet will put the nodes in a proper tree order. The stylesheet is incredibly simple. It applies a template starting from the root node (//*[not(@p_id)]), and this template applies itself recursively on all the nodes that has the p_id attribute value equal to its id attribute value (//*[@p_id = $id]):

GetDirectTree – returns the XML output as it is gathered by TREE_GET_XML;

GetStructuredTree – returns the XML output gathered by TREE_GET_XML and transformed in an underlying tree structure.

Tree to OLAP transformation

Suppose that this tree structure will serve as a data source for an OLAP cube dimension. The dimensions and the cube can be created easily using Analysis Services. But what happens when the tree structure has changed, having one level more? Is it enough just to refresh the dimensions and the cube? The answer is no, the cube will not reflect the changes.

Let's imagine that we have a data cube which has a TREE_OLAP dimension (doesn't matter what it contains – countries, regions, and cities; departments; geographical areas etc.) and a TIME dimension. A problem that I consider important in OLAP cubes creation and update is, what happens if the main structure of a dimension is changing and I have to add or delete some levels? How easily can I update the dimension and the cube? And, if possible, do that automatically.

TREE_OLAP dimension has as data source the TREE_OLAP table which contains the tree nodes in a totally different structure. The ID-parent ID tree implementation doesn't help the OLAP dimensions to get the data and to process it. They need a more redundant structure to arrange tree nodes on different levels. They need a column in the data source for every dimension level. The structure they need is the one shown in the second picture at the beginning of this article.

The TREE_OLAP table must be recreated every time the TREE hierarchy is changed. The columns in this table are the tree ID taken from the TREE table and, for every level of the hierarchy, a varchar column named "N", and the level value (e.g.: if the tree structure has three levels, the TREE_OLAP table will have 4 columns: IDint, N1varchar(100), N2varchar(100), and N3varchar(100)).

The TREE_OLAP table is re-created and filled with the appropriate values using the TREE_2_OLAP stored procedure. The procedure checks the maximum level of the hierarchy and re-creates the TREE_OLAP table. To populate this table, it must traverse the tree in a similar manner as it performs the TREE_RECALC_ITER stored procedure (with two "stack" temporary tables). For the currently processed node, the procedure inserts a record in the TREE_OLAP table at the corresponding "N" + LEVEL column:

Creating the Sales cube and dimensions

If we add some records in the TREE table, add the daily records for 2005 and 2006 in the TIME_BY_DAY table, and import data from the sales.txt file (which contains the TREE_ID, TIME_ID, and SALES_VALUE fields) into the SALES table, we can create the Sales cube. The sales.txt file contains records related only for 2005.

The time dimension is general, and implemented as it is in the Analysis Services "FoodMart 2000" sample database. The fnIsLeapYear, fnGetDaysNumber functions and the TIME_BY_DAY_GEN stored procedure generates daily records for a specified year in the TIME_BY_DAY table which will serve as the data source for the TIME dimension. The TREE_OLAP dimension is based on a star schema (only one table per dimension). The fact table called SALES has the following structure: IDint, TREE_IDint (foreign key to the ID column in the TREE table), TIME_IDint (foreign key to the ID column in the TIME_BY_DAY table), SALES_VALUESfloat (the measure column).

The cube data visualization will be:

The RefreshSalesCube application

The changes in the dimension and in the cube database can be done manually, using Analysis Services user interfaces, and programmatically using the DSO (Decision Support Objects – interop assembly) namespace for .NET framework 1.1, and AMO namespace for .NET framework 2.0.

The new cube data visualisation will reflect the changes without a manual update:

Conclusion

Using the implementation on the OLTP database, and a few lines of code to re-create and re-process the cube, the structure transformation and data changing is transparent for the reporting clients which consume the cube data. The process can be automated and scheduled using SQL Server jobs, and the time and work performance can be improved.

Share

About the Author

I live and work in Bucharest, Romania. I am programmer since 1998, when I have developed a "good taste" application for a catering company. Now I develop .NET applications (windows and ASP.NET) for large SQL Server database systems, with tens of millions of records.
I like to develop also in other languages like Object Pascal (Delphi), PHP, C++, VB, scripting. I enjoy the XML power, both on client side and server side.

Dear
thanks for nice article.
I need to extend the functionality and want to separate count of left and right child of given parent ID. is it possible to do it in this structure efficiently.
one more question, is this structure suitable to use for binary tree with 50k nodes without any performance penalty.

We're trying to put your idea on our tree structure, but we've the problem that its so slowly when first time assigning the NLEFT & NRIGHT values. Our tree has 30000 leafs in 7 levels and we're inserting on groups of 50 leafs the values from our old tree structure. The process has been running by 20 hours and only has inserted 4000 rows, any idea to make this process quickly?

Thank you for reading the article. Indeed, the performance is very poor when importing large structures. The design is good for reading the data, but it lacks of flexibility when data is bulk updated. Unfortunately, there is no easy way to achieve what you want. The NLEFT and NRIGHT re-calculation is done in the trigger and when you insert a node, multiple records can be updated, hence the poor performance. Having only 7 levels, you probably have chosen the recursive model. I am not sure if you already have indexes on the tree structure table. If not, this might improve the performance. If you use the iterative model, you can change the stored procedure TREE_CALC_ITER to access the intermediate user tables (T, TPC, TCP), instead of creating and dropping the temporary tables #T, #TPC, #TCP. Of course, create indexes on these tables to get better performance.

hello sir,
i am facing a stupid problem while making perspectives in C# for analysis server...the point is how to make perspectives and then to add it to a specific cube using C#.i really need your help.
thanks.

Thank you for reading the article. Regarding connection problem, I guess you can't connect to AS 2005 using DSO interop assembly. Instead, you can use Microsoft.AnalysisServices assembly, provided with SQL Server 2005 installation (on .NET framework 2.0). The RefreshSalesCube application must be updated using .NET framework 2.0 and Microsoft.AnalysisServices.

Good article
The nested sets tree algorithm is nice and I use it fairly often when I need speed in retrieving hierarchies in large treesets.
I can recommend the following newsgroup post that have given me a lot of help during the years.

:-> your examples have good details on tree db -> xml
how about tree xml ?

thank you so much and a great article
it had saved me a lot of time!

xaverine

ps - can i ask you a newbie question?
since there is 32 nested levels limit issue could I calculate the root node level to its deepest leaf node first in order to determine if > 32 levels (or by nleft and nright .. which i still dont understand)
and therefore choosing TREE_RECALC_ITER method as oppose to the more efficient TREE_RECALC_REC method

Appreciate your time for reading my article. It's a good question about sending XML data into tree db. Usually, the nodes in the tree can be stored one by one, but what is hapenning when you still want to insert a branch or a whole tree in the database?
Actually, this task is not so simple, but neither so difficult. If you have an underlying XML tree structure, you have to transform it into an intermediate XML structure which can be parsed by OPENXML rowset provider in SQL Server 2000/2005. You can use a XSL template to create a row by row XML tree:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

In SQL Server 2005, you can store directly XML structure using XML data type. And it's easy to work with, because it use new functions like query, exist, value, modify. As well, you can work with XML provided by FOR XML clause. There are many articles on internet, I recommend this one: http://msdn.microsoft.com/msdnmag/issues/04/02/XMLinYukon/[^]

Regarding next question, you can choose the recursive manner, only if the maximum level is lower then 32. It's simple to check the maximum level:
DECLARE @ROOT_ID INT
SET @ROOT_ID = 1

I think you found a great way to maintain flattened parent/child hierarchies specificaly when relational reporting is needed over such tables.

But my question is : why not using the built in Analysis Services dimension type "ParentChild dimension" ?

This dimension type cope with recursive relational tables like the one you have in the first place and avoid you all the further programming/drop/recreate dimension job ... which is far more simpler no ?
This dimension type resize/redistribute the hierarchy as levels/members appears/disappears...

In fact this built in dimension type has the great advantage to allow you to attach facts at a non leaf member of the hierarchy, thing you cannot do with a flattened tree structure in an Analysis Services OLAP cube. (think of an employee hierarchy each one selling thing from vendors to chief-of-the-chief-etc vendor)
Regards

(3) StockTransactionDetails
CREATE TABLE [dbo].[StockTransactionDetails]
(TransactionID int not null
, ItemID varchar(10)
, Quantity money not null
, Price moeny not null)
I want the result in a hierarchical format with parent child/subparent relationships which calculates the related fields as per their position in the hierarchy.
Here is a simplified form of the view.

I guess your problem will be solved using the same calculation algorithm applied on NLEFT and NRIGHT fields. So, add two supplementary fields NLEFT - int, NRIGHT int for Stock table, and create a trigger similar to TREE_SET_LEFT_RIGHT for this table.
This ensure that you can access a desired hierarchy, based on some root.
After that, a simple select with a few join clauses and a sub-query for the hierarchy, will get your results:
DECLARE @ROOT_ID INT
SET @ROOT_ID = 1

Yes, I added NLEFT and NRIGHT int fields and created the triggers as per your advice. But the result is not in the same way I wanted. I wanted the result to display all nodes in a hierarchical tree and Quantity to be summed up for each node as per its position in the hierarchy.
Thanks and regards.

I have understood what is your problem. Unfortunately, it is not possible to calculate hierarchical sums using a simple select statement. I would suggest to you to create a reporting table which stores the calculated fields you need. These fields should be recalculated using a stored procedure in a similar way as TREE_RECALC_ITER stored procedure does. You will need two "stack" temporary tables, one for navigate in the tree, the other one to go back to the ancestors for the node processed currently, because a change in the calculated field on some branch affects the calculated fields of the ancestors.
Of course, if you choose to create a OLAP cube having as dimension appropriate Items table and as fact table the StockTransactionDetails table using measures quantity and other calculated fields, your problem will be solved automatically.