Hierarchies are sometimes difficult to store in SQL tables...things like trees, threaded forums, org charts and the like...and it's usually even harder to retrieve the hierarchy once you do store it. Here's a method that's easy to understand and maintain, and gives you the full hierarchy (or any piece of it) very quickly and easily.

it seems everytime i post something on the more fun topics rob always comes out with an article a few days later ;-) heh seems like it at least for the last 3 articles :-p he always gives a superior technique that makes me very excited and want to .... never mind ... good artcile because i use hierarchies all over the place ...

Thanks Rob. I too found Joe's stuff a little daunting...I think I like this approach and can't wait for your next installment. Will you post a link here when you're done so we can subscribe "in advance"?

Ta

--I hope that when I die someone will say of me "That guy sure owed me a lot of money"

While this is a usable technique, it still falls down when you need to represent a graph instead of a tree.

The classical example is a bill of materials where you have 4 different parts that use the same bolt (or share a common component. This is something that I've had to do a short while ago.), but you can still run into it with an org chart.

For instance, a little over month ago I was put "on loan" to the IS department. I had one person who was nominally in charge of the programming work I was doing (but was too busy to really look at it, so I had free reign to design and implement), the IS manager, and my normal manager who I still reported to.

Now that I'm back doing my "secretary" work, I still have one immediate supervisor and nominally report to the second shift superintendant also. (at the same time, when my immediate supervisor is gone, I'm in charge of the department, even though I have no authority. lol.)

In those cases, you're still back to an iterative process, and storing the relationships in another table to maintain normalization. (I think that is also an adjacency model, but I could be wrong on the exact terminology.)

Don't get me wrong, there are some nice aspects of this approach, but it's not something that I've been able to use in any of my projects. (I haven't dug into the nested set model for the same reason.)

A-ha! Actually, you can create any number of independent trees with this structure, that's why I spun off the data into a Tree table. You could simply create a "Boss2" column, then run the lineage UPDATE for Boss2 instead of Boss/Boss1. Or, add a new node with the same EmployeeID, with the appropriate parent node and lineage. That's why splitting the employee info from the node info is a good idea; it lets you do things like this.

I'll elaborate on this a little more in Part 2, but oh yeah, you can definitely have two bosses with this setup, you just have to do a little more work with it.

I noticed the independant tree feature, but things still break down for graphs.

I'm going to move back to the bill of materials example.

Part Foo is made up of a Bar and a Thinga. A Thinga is made up of a processed Majob.

You have a part FooX, that's made up of the brand new Barzac, but also still uses the same Thinga.

What happens when you re-engineer the Thinga, and make it out of a processed SomethingMaRether instead? You've got it in several trees so you have to alter each tree.

Granted, that is still relatively easy to do, but my normalization instincts are popping in to tell me it's bad. (Unfortunately, I'm still too new at SQL to know that some denormalization is perfectly acceptable ;)

I agree that the lineage method is probably not as ideal for nested sets/assemblies/BOM structures as Joe's nested set model is. I personally haven't made the mental leap that treats trees and nested sets/BOM as the same thing. The diagrams can be made to look the same, but in reality they do not function the same way.

The bill of materials structure models something a little different from an org chart. Each part is part of a larger assembly, but that part cannot just be wantonly moved or removed from the assembly. You can't take a bolt in an automobile engine and promote it to be the parent assembly of the entire engine.

On the other hand, a person in a corporate organization can be transferred, promoted or removed anywhere in the organization, and may or may not inherit subtrees of that organization. This was what I was concerned with when putting this method together. Since the hierarchy is truly arbitrary, it was intended to support any kind of node manipulation without regard to its current relationship to the hierarchy.

Regarding the example you gave, a "Thinga" is actually a subtree, not a single node. It is a bit harder to manipulate subtrees using lineage vs. nested sets. If subtrees are more important than single nodes, I would recommend nested sets. If nodes are more important, lineage will probably work better for you.

Ahh, but the nodes and the sub-trees are equally important in a bill of materials (As an example, we track the coil steel that we buy, as well as the parts that we stamp out of the steel that later get welded/bolted/sold), and there is promotion and demotion. For instance, we could buy a component from a vendor (a node) and then decide that we can make that component ourselves, and so it would then be a subtree. (or vice versa.) Or, it's possible to re-engineer the production line so that instead of assembling components at another line, they are added to the finished part directly (and thus cutting out middle management? Or at least the intervening compl's)

I think the BOM is conceptually identical to an organizational chart. If I just change the nouns and verbs, it would descibe the same thing.

That said, I did just notice a feature of this approach.

With the multiple independant trees, you don't _have_ to have the same children in each tree. (an example, a supervisor has two different managers. Some of the people he supervises should fall under one manager, while the rest would be under the other.) This has a distinct advantage (IMHO) over an adjacency model (Where you would have to store a "Department" code in the relationship table, and I've thought of a few other pieces of ugliness that this introduces.)

So, my main question would be how to maintain the independant trees, when there _are_ interelations? (In the above example, one of the supervisor's employees falls underneath both manager's departments, or there is another manager who is over everyone.)

>> However (you knew this was coming!) one of the issues I have with nested sets is the complexity required to do relatively simple tasks, like adding, deleting, or moving nodes in the tree. Even finding an employee's immediate supervisor or subordinates requires 3 self-joins AND a subquery! <<

I havea book on nothing but trees in SQL in the works right now. over the years, people have found a bunch of neat tricks with the nested set model.

To insert a new node, G1, under part G. We can insert one node at a time like this:

The idea is to spread the lft and rgt numbers after the youngest child of the parent, G in this case, over by two to make room for the new addition, G1. This procedure will add the new node to the rightmost child position, which helps to preserve the idea of an age order among the siblings.

To convert a nested sets model into an adjacency list model, which is the same as finding the immediate subodinates:

SELECT B.emp AS boss, P.emp FROM OrgChart AS P LEFT OUTER JOIN OrgChart AS B ON B.lft = (SELECT MAX(lft) FROM OrgChart AS S WHERE P.lft > S.lft AND P.lft < S.rgt);

>>does anyone have an example of a good method for converting an adjacency model hierarchy table into a nested-set model table? ... I have an adjacency-model type table with several hundred thousand records and 12 levels of depth . . . a set-based conversion method would be best. <<

I never found a pure set based method. To convert an adjacency list model into a nested set model, use a push down stack algorithm. Assume that we have these tables:

I am trying to work through the allpairs and leftmostpairs views and the TreeMerge procedure described here. Is this the best method for my situation or should I work towards adapting the stack algorithm for multiple roots?

quote:The woods are lovely, dark and deep.But I have promises to keep,And miles to go before I sleep,And miles to go before I sleep.

I thought I might include my SQL Server translation of the adjacency to nested set conversion code as adapted from the Celko article. This seems to deal with a multi-root adjacency model pretty well....

The boss column is the tree 'owner'; that is to say the emp at the root of that particular tree. The lft/rgt values refer to the node position in the boss's tree only.Now, I was not able to make the Page47 tree use the 13/16 lft/rgt values, but boss column seems to square that away.

I would love input from any intereste party.

(Rob, I like your model too and I don't mean to hi-jack this thread . . . hope you understand :) )

It's funny... I used your 'lineage' tip in my little web project: http://yesi.dread.ws/ . This is a MySQL webapp, but in an SQL approach I took the method you explain.The only thing I don't understand is why you keep a 'depth' field in your SQL Tree table: your 'lineage' field contains the way (in terms of node ids) to the root node, so the 'lineage' field already contains the 'depth' information.Anyway nice article !

Thanks! It's a convenience really, you don't need it, but it takes up almost no space (you can make it a tinyint - 1 byte for up to 255 levels!) In case you need to know the depth, or count back or forward in levels, it's handy. You'd have to do something like this to calculate it from the lineage column:

SELECT Len(Lineage)-Len(Replace(Lineage,'/','')) AS Depth FROM Tree

Depending on your SQL product, the Len() and Replace() functions may not exist.