Friday, August 19, 2005

It's been a while since I've been to my employers HQ. Usually, I'm stationed at my client's office, but today, me and a co-worker of mine, demo-ed a portal solution to my client.

Anyway, the presenstation went fine, and I spent the rest of the afternoon telling my colleages about a nifty little trick I picked up at the MySQL Forum. It's a not really a trick, it's actually a very clever datamodel to store hierarchical data in a relational database.

Usually, this is done using the adjency list model, but because this model relies on a recursive relationship, it's a hell of a job to extract all the data relating to an entire brach of the hierarchy. The trick I picked up is the nested set model. You can read all about that over here:

So, what we're seeing here is a typical adjency list implementation, with a recursive relationship defined on the parent_id column. This column optionally references the id of the table. Now there's one extra feature in this table, the position column. The purpose of the position column is to define the order in which sibling rows should be sorted.

Now, part of his application wants to select an entire tree to depict the complete work breakdown structure. Oracle has some built-in features that relieves some of the burdens you encounter when you're working with the 'adjacency list model'. In particular, Oracle supports the CONNECT BY language construct which tells the database to perform a recursive query:

It looks a bit like a JOIN, but what's really odd is that there's only one instance of the product table. So, connect by literally joins a table to itself - not to another instance of itself (SELF or AUTO JOIN). Related constructs are the PRIOR operator and the START WITH clause.

The construct is best explained by assuming that execution starts with the row from the product table for wich id = 1 is true. Then, all other rows are scanned, looking for those rows where the parent_id matches the id from the original row we started out with. Once we're done with that, the level pseudocolumn is incremented by 1, and the cycle is started anew, this time staring out with the id's from the rows we gained this round. A process of cycles is executed untill there are no more rows to match. Query execution is then complete, and the result is returned.

The algoritm is like a breadth first recursive algorithm. That is, each cycle retrieves only the immediate child rows, and only when these are all retrieved, the level pseudocolumn is incremented, and the next cycle is started with the rows we just retrieved in the round. (Note that the rows are not returned in the order of depth. It all remains a relational system, and any order or sorting must be applied explicitly.)

This recursiveness is what really distinguishes CONNECT BY from an ordinary join. It's very powerful in the sense that one need not know how deep the tree is: CONNECT BY will execute and execute until there is no new level to descend to. Unfortunately, the query will throw an exception without retunring any row when there's a circulare reference in the data. As of Oracle

Now, there's a slight problem with this result. It sure does yield the right rows, but what we really want is to have Oracle present the results like, well, like a tree.The result should be something like this:

Well I was really surprised to hear that so far, no one had come up with a query variant that would return the rows from the original query in the desired order. In fact, my co-worker told me that it was IMPOSSIBLE to do it. So, what can I do but feel really challenged? I started of full of optimism, only to discover that everything i tried amounted to nothing. I kept coming back at the same theme: to order the nodes depth-first according to their parent_id and position, you need to access all of the ancestor rows in order to account for the positional information stored in their parent_id and position columns too.

Well, I still hope there's a better way, but finally, I came up with this:

Now, I know the solution is flawed. For example, you can't have more than 9 children per node, and you can't have a deeper tree than 38 (number precision). You could tinker it a bit to allow for 99 children per node with a maximum depth of 19, which seems reasonable for this particular purpose (work breakdown structure)

However, the more I'm familiarizing myself with the nested set model, the more i feel that that is a much, much better solution to handle hierarchy within a relational database.

what i was looking forward to was a column with name "level" with 11 rows in it starting with 1 and incrementing till 11.

I have tried this query on other versions of oracle databases in my workplace and seems to have worked for some of them,but for the one which is installed in my pc it fails to run and gives only "no rows returned"

Is there anything that i need to do like run any scripts before i start doing these things with my hierarchical queries?