Search This Blog

Tree in SQL database: The Nested Set Model

Tree structure are very commonly used by all the application who need to manage a large quantity of data. However once have in memory the tree data the application need a way to save them for future use. SQL database can be a good solution but, in this case, we need a way to store data in efficient mode. This post will propose a model called Nested Set.

IMPORTANT NOTE: Please note that I'm not the original author of this article. I found it some years ago in the MySQL web site but after some time the article disappeared and now it seem is very hard to find around. Since I saved it I want to repropose here mainly because I think is a very well written article explaining the model and proposing examples SQL query to use. The only example missing from this article is the query for move a branch with all the child nodes. I found a working query in a newsgroup and I added them at the end of this post. Again, since I think is very important, I repeat the content of this post are not mine. If you are the original author and you want I remove this post or want I add some copyright link or similar just ask.

The Nested Set Model

What I would like to focus on in this article is a different approach, commonly referred to as the Nested Set Model. In the Nested Set Model, we can look at our hierarchy in a new way, not as nodes and lines, but as nested containers. Try picturing our electronics categories this way:

Notice how our hierarchy is still maintained, as parent categories envelop their children.We represent this form of hierarchy in a table through the use of left and right values to represent the nesting of our nodes:

So how do we determine left and right values? We start numbering at the leftmost side of the outer node and continue to the right:

This design can be applied to a typical tree as well:

When working with a tree, we work from left to right, one layer at a time, descending to each node's children before assigning a right-hand number and moving on to the right. This approach is called the modified preorder tree traversal algorithm.

Retrieving a Full Tree

We can retrieve the full tree through the use of a self-join that links parents with nodes on the basis that a node's lft value will always appear between its parent's lft and rgt values:

Unlike our previous examples with the adjacency list model, this query will work regardless of the depth of the tree. We do not concern ourselves with the rgt value of the node in our BETWEEN clause because the rgt value will always fall within the same parent as the lft values.

Finding all the Leaf Nodes

Finding all leaf nodes in the nested set model even simpler than the LEFT JOIN method used in the adjacency list model. If you look at the nested_category table, you may notice that the lft and rgt values for leaf nodes are consecutive numbers. To find the leaf nodes, we look for nodes where rgt = lft + 1:

We have already looked at how to show the entire tree, but what if we want to also show the depth of each node in the tree, to better identify how each node fits in the hierarchy? This can be done by adding a COUNT function and a GROUP BY clause to our existing query for showing the entire tree:

Of course, in a client-side application you will be more likely to use the depth value directly to display your hierarchy. Web developers could loop through the tree, adding <li></li> and <ul></ul> tags as the depth number increases and decreases.

Depth of a Sub-Tree

When we need depth information for a sub-tree, we cannot limit either the node or parent tables in our self-join because it will corrupt our results. Instead, we add a third self-join, along with a sub-query to determine the depth that will be the new starting point for our sub-tree:

This function can be used with any node name, including the root node. The depth values are always relative to the named node.

Find the Immediate Subordinates of a Node

Imagine you are showing a category of electronics products on a retailer web site. When a user clicks on a category, you would want to show the products of that category, as well as list its immediate sub-categories, but not the entire tree of categories beneath it. For this, we need to show the node and its immediate sub-nodes, but no further down the tree. For example, when showing the PORTABLE ELECTRONICS category, we will want to show MP3 PLAYERS, CD PLAYERS, and 2 WAY RADIOS, but not FLASH.

This can be easily accomplished by adding a HAVING clause to our previous query:

This is our typical whole tree query with a COUNT and GROUP BY added, along with a reference to the product table and a join between the node and product table in the WHERE clause. As you can see, there is a count for each category and the count of subcategories is reflected in the parent categories.

Adding New Nodes

Now that we have learned how to query our tree, we should take a look at how to update our tree by adding a new node. Let's look at our nested set diagram again:

If we wanted to add a new node between the TELEVISIONS and PORTABLE ELECTRONICS nodes, the new node would have lft and rgt values of 10 and 11, and all nodes to its right would have their lft and rgt values increased by two. We would then add the new node with the appropriate lft and rgt values. While this can be done with a stored procedure in MySQL 5, I will assume for the moment that most readers are using 4.1, as it is the latest stable version, and I will isolate my queries with a LOCK TABLES statement instead:

In this example we expand everything to the right of the left-hand number of our proud new parent node, then place the node to the right of the left-hand value. As you can see, our new node is now properly nested:

The last basic task involved in working with nested sets is the removal of nodes. The course of action you take when deleting a node depends on the node's position in the hierarchy; deleting leaf nodes is easier than deleting nodes with children because we have to handle the orphaned nodes.

When deleting a leaf node, the process if just the opposite of adding a new node, we delete the node and its width from every node to its right:

The other scenario we have to deal with is the deletion of a parent node but not the children. In some cases you may wish to just change the name to a placeholder until a replacement is presented, such as when a supervisor is fired. In other cases, the child nodes should all be moved up to the level of the deleted parent:

In this case we subtract two from all elements to the right of the node (since without children it would have a width of two), and one from the nodes that are its children (to close the gap created by the loss of the parent's left value). Once again, we can confirm our elements have been promoted:

Other scenarios when deleting nodes would include promoting one of the children to the parent position and moving the child nodes under a sibling of the parent node, but for the sake of space these scenarios will not be covered in this article.

Comments

I explicitly written in the IMPORTANT NOTE on the top of this article that I'm not the original author of this article. I found it some years ago in the MySQL web site but after some time the article disappeared and now it seem is very hard to find around. I didn't know the link you posted. If the blog's administrator is the original author of this article and want I remove it he need to simply ask, as I wrote in the same note.

Move a branch with childsstore procedure in mysql not working,Please explain it, how to use it.And please explain these two var (@my_root CHAR(2), @new_parent CHAR(2)) with a example.What is meaning of rgt = rgt + CASE line.Any help please...

This is not standard SQL code but, I guess, is some specific DB format incuding some additional programming commands. my_root is the root node of the subtree you want to move and new_parent is the, as the name suggest, the parent node you want to "attach" your subtree. "rgt = rgt + CASE" is a command for use the value "returned" by the two following WHEN based to the current value processed. You can not use this code as is, this is only a snippet showing the alghoritm to convert in your programming language code.

Post a Comment

Popular posts from this blog

GPIO mean "General Purpose Input/Output" and is a special pin present in some chip that can be set as input or output and used to move a signal high or low (in output mode) or to get the signal current status (in input mode). Usually these pin are directly managed by kernel modules but there are an easy way to manage these pins also from user space.

If you want to launch an Android app from shell command line terminal there is a simply command allow to do that but you have to know some info about the app you want to execute. The command is named am and is basically a command line interface to the system ActivityManager.

Once finished to develop your application using the great Qt libraries the next step will be release phase. Here you need to provide in addition of the main executable all the dll libraries dependencies required by your application to run. Basically, since the application was developed using Qt, the dependencies will be only Qt dll and some few dll connected to the compiler used.