Adam Machanic : Hierarchieshttp://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Hierarchies/default.aspxTags: HierarchiesenCommunityServer 2.1 SP2 (Build: 61129.1)Swinging From Tree to Tree Using CTEs, Part 2: Adjacency to Nested Intervalshttp://www2.sqlblog.com/blogs/adam_machanic/archive/2006/07/12/swinging-from-tree-to-tree-using-ctes-part-2-adjacency-to-nested-intervals.aspxThu, 13 Jul 2006 01:49:00 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:105Adam Machanic4http://www2.sqlblog.com/blogs/adam_machanic/comments/105.aspxhttp://www2.sqlblog.com/blogs/adam_machanic/commentrss.aspx?PostID=105In our <a href="http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/swinging-from-tree-to-tree-using-ctes-part-1-adjacency-to-nested-sets.aspx">previous installment</a>, we saw how to convert Adjacency Lists into Nested Sets using a CTE.<br>
<br>
In this episode, we will convert the Adjacency List into a <a href="http://www.dbazine.com/oracle/or-articles/tropashko4">Nested Intervals</a>
encoding.&nbsp; Specifically, this encoding will make use of the Nested
Intervals with Continued Fractions technique that Tropashko presented
in a <a href="http://arxiv.org/ftp/cs/papers/0402/0402051.pdf">later paper</a>.<br>
<br>
The key to this technique lies in using a slightly different form of
materialized path than was used in the last post.&nbsp; Rather than
materializing the EmployeeIds into a path, the path will be created as
an <i>enumerated</i>
representation, based on sibling ordering.&nbsp; For example, the first
two levels of the AdventureWorks HumanResources.Employee table's tree
look like:<br>
<br>
<blockquote>
EmployeeId 109 (CEO)<br>
<blockquote>EmployeeId 6 (Marketing Manager)<br>
EmployeeId 12 (VP Engineering)<br>
EmployeeId 42 (IS Manager)<br>
EmployeeId 140 (CFO)<br>
EmployeeId 148 (VP Production)<br>
EmployeeId 273 (VP Sales)<br>
<br>
</blockquote>
</blockquote>
Using the materialized path representation from the previous post, the paths to the second-level employees would be:<br>
<br>
<blockquote>6: 109.6<br>
12: 109.12<br>
42: 109.42<br>
140: 109.140<br>
148: 109.148<br>
273: 109.273<br>
<br>
</blockquote>
However, for this post, paths will instead be materialized based on
sibling ordering.&nbsp; We don't know anything about how siblings
should be ordered in the AdventureWorks employee hierarchy (that's
probably a business rule question, if it matters at all).&nbsp; So
ordering will be done by EmployeeId:<br>
<br>
<blockquote>6: 1.1<br>
12: 1.2<br>
42: 1.3<br>
140: 1.4<br>
148: 1.5<br>
273: 1.6<br>
</blockquote>
The significance of this encoding is that for each path a rational
number can be generated, using a Euclidian algorithm (described very
well on <a href="http://www.mcs.surrey.ac.uk/Personal/R.Knott/Fibonacci/cfINTRO.html">this web site</a>).&nbsp;
The algorithm works by iterating over each element on the path,
building a rational number as it goes.&nbsp; The beauty of this, as
shown by Tropashko in his paper, is that by using these paths we can
determine an interval, beween which all children of a given path will
fall.<br>
<br>
In order to accomplish getting the path, the ROW_NUMBER function
will be used, but slightly differently than last time.&nbsp; Instead of
creating a second CTE to reference the first, thereby getting the row
number for each element of the path, the ROW_NUMBER function will be
embedded within the recursive CTE itself, as in the following example:<br>
<blockquote><font face="Courier New">WITH EmployeeRows AS<br>
(<br>
&nbsp;&nbsp;&nbsp; SELECT<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; EmployeeId,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; ManagerId,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; ROW_NUMBER() OVER (ORDER BY EmployeeId) AS theRow<br>
&nbsp;&nbsp;&nbsp; FROM HumanResources.Employee<br>
&nbsp;&nbsp;&nbsp; WHERE ManagerId IS NULL<br>
<br>
&nbsp;&nbsp;&nbsp; UNION ALL<br>
<br>
&nbsp;&nbsp;&nbsp; SELECT<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; e.EmployeeId,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; e.ManagerId,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; ROW_NUMBER() OVER (ORDER BY e.EmployeeId) AS theRow<br>
&nbsp;&nbsp;&nbsp; FROM EmployeeRows x<br>
&nbsp;&nbsp;&nbsp; JOIN HumanResources.Employee e ON e.ManagerId = x.EmployeeId<br>
)<br>
SELECT *<br>
FROM EmployeeRows<br>
ORDER BY <br>
&nbsp;&nbsp;&nbsp; ManagerId, <br>
&nbsp;&nbsp;&nbsp; EmployeeId</font><br>
</blockquote>
Interestingly, this example as-is will return <i>exactly the same results</i> as the following, non-recursive CTE example:<br>
<blockquote><font face="Courier New">SELECT<br>
&nbsp;&nbsp;&nbsp; EmployeeId,<br>
&nbsp;&nbsp;&nbsp; ManagerId,<br>
&nbsp;&nbsp;&nbsp; ROW_NUMBER() OVER (PARTITION BY ManagerId ORDER BY EmployeeId) AS theRow<br>
FROM HumanResources.Employee<br>
ORDER BY <br>
&nbsp;&nbsp;&nbsp; ManagerId, <br>
&nbsp;&nbsp;&nbsp; EmployeeId<br>
</font></blockquote>
So, why do we care?&nbsp; Take a close look at the two examples.&nbsp;
In the CTE example, the ROW_NUMBER function does not use PARTITION
BY.&nbsp; Yet, results are <i>implicitly</i> partitioned.&nbsp; This
gives an interesting view into the inner-workings of CTEs.&nbsp; As it
turns out, the recursive part of the CTE is called once per row
returned by the anchor or previous recursion.&nbsp; This is not how I
originally expected CTEs to behave (I thought the recursive part would
be called once per rowset returned by the anchor or previous
recursion), but it does help us with this particular task!<br>
<br>
The current row number, at any given point in the recursion, represents
the enumeration for that node.&nbsp; But because we're using a
recursive CTE, we also have access to the parent's enumaration.&nbsp;
For instance, to build an enumerated path, the following T-SQL would be
used:<br>
<blockquote><font face="Courier New">WITH EmployeeRows AS<br>
(<br>
&nbsp;&nbsp;&nbsp; SELECT<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; EmployeeId,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; ManagerId,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; CONVERT(VARCHAR(MAX), ROW_NUMBER() OVER (ORDER BY EmployeeId)) AS thePath<br>
&nbsp;&nbsp;&nbsp; FROM HumanResources.Employee<br>
&nbsp;&nbsp;&nbsp; WHERE ManagerId IS NULL<br>
<br>
&nbsp;&nbsp;&nbsp; UNION ALL<br>
<br>
&nbsp;&nbsp;&nbsp; SELECT<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; e.EmployeeId,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; e.ManagerId,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; x.thePath + '.' +
CONVERT(VARCHAR(MAX), ROW_NUMBER() OVER (ORDER BY e.EmployeeId)) AS
thePath<br>
&nbsp;&nbsp;&nbsp; FROM EmployeeRows x<br>
&nbsp;&nbsp;&nbsp; JOIN HumanResources.Employee e ON e.ManagerId = x.EmployeeId<br>
)<br>
SELECT *<br>
FROM EmployeeRows<br>
ORDER BY <br>
&nbsp;&nbsp;&nbsp; thePath<br>
</font></blockquote>
Note that this sample can be made a bit more readable (and more
functional for later) by embeding the ROW_NUMBER in a derived table:<br>
<blockquote><font face="Courier New">WITH EmployeeRows AS<br>
(<br>
&nbsp;&nbsp;&nbsp; SELECT<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; y.EmployeeId,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; y.ManagerId,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; CONVERT(VARCHAR(MAX), y.theRow) AS thePath<br>
&nbsp;&nbsp;&nbsp; FROM<br>
&nbsp;&nbsp;&nbsp; (<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; SELECT<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; EmployeeId,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; ManagerId,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; ROW_NUMBER() OVER (ORDER BY EmployeeId) AS theRow<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; FROM HumanResources.Employee<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; WHERE ManagerId IS NULL<br>
&nbsp;&nbsp;&nbsp; ) y<br>
<br>
&nbsp;&nbsp;&nbsp; UNION ALL<br>
<br>
&nbsp;&nbsp;&nbsp; SELECT<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; y.EmployeeId,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; y.ManagerId,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; y.thePath + '.' + CONVERT(VARCHAR(MAX), y.theRow) AS thePath<br>
&nbsp;&nbsp;&nbsp; FROM<br>
&nbsp;&nbsp;&nbsp; (<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; SELECT<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; e.EmployeeId,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; e.ManagerId,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; x.thePath,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; ROW_NUMBER() OVER (ORDER BY e.EmployeeId) AS theRow<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; FROM EmployeeRows x<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; JOIN HumanResources.Employee e ON e.ManagerId = x.EmployeeId<br>
&nbsp;&nbsp;&nbsp; ) y<br>
)<br>
SELECT *<br>
FROM EmployeeRows<br>
ORDER BY <br>
&nbsp;&nbsp;&nbsp; thePath<br>
</font></blockquote>
From here, it's a simple step to implement the Euclidian algorithm.&nbsp; The algorithm is quite simple:<br>
<ol><li>Set parentNumerator &lt;- 1</li><li>Set parentDenominator &lt;- 0</li><li>Set theElement &lt;- first enumeration in the path</li><li>Set currentNumerator &lt;- theElement</li><li>Set currentDenominator &lt;- 1</li><li>Set theElement &lt;- next enumeration in the path</li><li>Set previousParentNumerator &lt;- parentNumerator</li><li>Set previousParentDenominator &lt;- parentDenominator</li><li>Set parentNumerator &lt;- currentNumerator</li><li>Set parentDenominator &lt;- currentDenominator</li><li>Set currentNumerator &lt;- (parentNumerator * theElement) + previousParentNumerator</li><li>Set currentDenominator &lt;- (parentDenominator * theElement) + previousParentDenominator</li><li>If the current element is not the final node in the path, goto 6.</li></ol>
This seems a bit hairy, but I think that looking at the algorithm
and spending a few minutes with our old friends pencil and paper will
make it quite clear.&nbsp; Also, look at the web site linked
above.&nbsp; Here is how I've implemented the algorithm using a CTE:<br>
<blockquote><font face="Courier New">WITH EmployeeRows AS<br>
(<br>
&nbsp;&nbsp;&nbsp; SELECT<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; EmployeeId,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; CONVERT(VARCHAR(MAX), theRow) AS thePath,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; CONVERT(BIGINT, 1) AS prevNumer,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; CONVERT(BIGINT, 0) AS prevDenom,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; CONVERT(BIGINT, theRow) AS currNumer,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; CONVERT(BIGINT, 1) AS currDenom<br>
&nbsp;&nbsp;&nbsp; FROM<br>
&nbsp;&nbsp;&nbsp; (<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; SELECT<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; EmployeeId,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; ROW_NUMBER() OVER (ORDER BY EmployeeId) AS theRow<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; FROM HumanResources.Employee<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; WHERE ManagerId IS NULL<br>
&nbsp;&nbsp;&nbsp; ) y<br>
&nbsp;&nbsp;&nbsp; <br>
&nbsp;&nbsp;&nbsp; UNION ALL<br>
&nbsp;&nbsp;&nbsp; <br>
&nbsp;&nbsp;&nbsp; SELECT<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; y.EmployeeId,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; y.thePath + '.' + CONVERT(VARCHAR(MAX), y.theRow) AS thePath,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; prevNumer = y.currNumer,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; prevDenom = y.currDenom,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; (y.currNumer * y.theRow) + y.prevNumer&nbsp; AS currNumer,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; (y.currDenom * y.theRow) + y.prevDenom&nbsp; AS currDenom<br>
&nbsp;&nbsp;&nbsp; FROM<br>
&nbsp;&nbsp;&nbsp; (<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; SELECT <br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; e.EmployeeID,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; x.thePath,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; x.currNumer,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; x.currDenom,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; x.prevNumer,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; x.prevDenom,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; ROW_NUMBER() OVER (ORDER BY e.EmployeeID) AS therow<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; FROM EmployeeRows x<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; JOIN HumanResources.Employee e ON e.ManagerId = x.EmployeeId<br>
&nbsp;&nbsp;&nbsp; ) y<br>
)<br>
</font></blockquote>
Note that in this case I didn't require the use
of the temporary variables; the previous anchor/recursive parts act as
temporary storage enough.&nbsp; <br>
<br>
Readers will also hopefully notice that I haven't yet included a SELECT
to get the data from the CTE!&nbsp; This is because I'd like to explain
briefly what it will do.&nbsp; In his paper, Tropashko explains that for
each node, the intervals for the children of that node will fall into
an interval between the encoding of that node (currNumer / currDenom)
and the numerator of the previous node plus the numerator for the
current node, divided by the denominator of the previous node plus the
denominator for the current node ((currNumer + prevNumer) / (currDenom
+ prevDenom)).&nbsp; Quite wordy here.&nbsp; Refer to the paper for a better
explanation and proof.<br>
<blockquote>
</blockquote>
Anyway, the completed query follows:<br>
<blockquote><font face="Courier New">WITH EmployeeRows AS<br>
(<br>
&nbsp;&nbsp;&nbsp; SELECT<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; EmployeeId,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; CONVERT(VARCHAR(MAX), theRow) AS thePath,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; CONVERT(BIGINT, 1) AS prevNumer,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; CONVERT(BIGINT, 0) AS prevDenom,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; CONVERT(BIGINT, theRow) AS currNumer,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; CONVERT(BIGINT, 1) AS currDenom<br>
&nbsp;&nbsp;&nbsp; FROM<br>
&nbsp;&nbsp;&nbsp; (<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; SELECT<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; EmployeeId,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; ROW_NUMBER() OVER (ORDER BY EmployeeId) AS theRow<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; FROM HumanResources.Employee<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; WHERE ManagerId IS NULL<br>
&nbsp;&nbsp;&nbsp; ) y<br>
&nbsp;&nbsp;&nbsp; <br>
&nbsp;&nbsp;&nbsp; UNION ALL<br>
&nbsp;&nbsp;&nbsp; <br>
&nbsp;&nbsp;&nbsp; SELECT<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; y.EmployeeId,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; y.thePath + '.' + CONVERT(VARCHAR(MAX), y.theRow) AS thePath,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; prevNumer = y.currNumer,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; prevDenom = y.currDenom,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; (y.currNumer * y.theRow) + y.prevNumer&nbsp; AS currNumer,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; (y.currDenom * y.theRow) + y.prevDenom&nbsp; AS currDenom<br>
&nbsp;&nbsp;&nbsp; FROM<br>
&nbsp;&nbsp;&nbsp; (<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; SELECT <br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; e.EmployeeID,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; x.thePath,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; x.currNumer,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; x.currDenom,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; x.prevNumer,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; x.prevDenom,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; ROW_NUMBER() OVER (ORDER BY e.EmployeeID) AS therow<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; FROM EmployeeRows x<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; JOIN HumanResources.Employee e ON e.ManagerId = x.EmployeeId<br>
&nbsp;&nbsp;&nbsp; ) y<br>
)<br>
SELECT<br>
&nbsp;&nbsp;&nbsp; EmployeeId,<br>
&nbsp;&nbsp;&nbsp; thePath,<br>
&nbsp;&nbsp;&nbsp; currNumer AS startNumer,<br>
&nbsp;&nbsp;&nbsp; currDenom AS startDenom,<br>
&nbsp;&nbsp;&nbsp; currNumer + prevNumer AS endNumer,<br>
&nbsp;&nbsp;&nbsp; currDenom + prevDenom AS endDenom<br>
FROM EmployeeRows<br>
</font></blockquote>
For each node (EmployeeId), you now have an interval (start and end)
within which all children intervals will fall.&nbsp; Note that computation
must be done by the interval, not by the current node's encoding.&nbsp; The
reason becomes apparent when looking at the encodings for 1.1.1 and
1.2.&nbsp; They are the same; however, their intervals do not overlap.&nbsp; As a
matter of fact, encodings will be the same for every next sibling/first
child pair.&nbsp; But the intervals remain nested, and if proper queries are
written there will be no confusion.<br>
<br>
So that's a first step towards using the Nested Intervals Model in SQL
Server 2005.&nbsp; Stay tuned for more... And as always, feel free to post
questions or comments.&nbsp; I know some of this material can be confusing
(at least, it was to me before I wrote this post!)<br><br><img src="http://www2.sqlblog.com/aggbug.aspx?PostID=105" width="1" height="1">HierarchiesT-SQLSwinging From Tree to Tree Using CTEs, Part 1: Adjacency to Nested Setshttp://www2.sqlblog.com/blogs/adam_machanic/archive/2006/07/12/swinging-from-tree-to-tree-using-ctes-part-1-adjacency-to-nested-sets.aspxThu, 13 Jul 2006 01:47:00 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:103Adam Machanic20http://www2.sqlblog.com/blogs/adam_machanic/comments/103.aspxhttp://www2.sqlblog.com/blogs/adam_machanic/commentrss.aspx?PostID=103I'm not sure how many times over the last several years I've seen the
same tired article titles... "Climbing Trees in SQL," "Climbing Up the
SQL Tree," or maybe, "Naked Coeds Playing in the Trees!" ... Oh
wait, I think that last one might be something else.<br>
<br>
But anyway, the point is, I'm going to adhere to that standard.&nbsp;
But I'm adding a Tarzan-esque theme to this post, because we're not
going to just climb a tree.&nbsp; We're going to swing about between
trees.&nbsp; Different types of tree representations are appropriate
for different scenarios.&nbsp; Which is why, as I pointed out in my <a href="http://searchsqlserver.techtarget.com/tip/1,289483,sid87_gci1107414,00.html?FromTaxonomy=/pr/301329">recent SearchSQLServer article on recursive Common Table Expressions</a>,
we have so many different ways of representing them.&nbsp; Adjacency
Lists, Materialized Paths, Nested Sets, and Nested Intervals spring to
mind.&nbsp; And there are probably others.<br>
<br>
My article shows how to use the CTEs, in conjunction with a dynamically
generated materialized path, to manipulate an Adjacency List, getting
many of the benefits associated with using the Nested Sets Model.&nbsp;
And that's great.&nbsp; But the Nested Sets Model itself might be
useful in your endeavors.&nbsp; So in this post, I will show how to
extend the CTE discussed in that article.<br>
<br>
The end-result CTE in the article, which can be run in the
AdventureWorks database, looks something like this (renamed for the
sake of this post):<br>
<br>
<blockquote><font face="Courier New">WITH EmployeeLevels AS</font><br>
<font face="Courier New">
(</font><br>
<font face="Courier New">
&nbsp;&nbsp;&nbsp; SELECT</font><br>
<font face="Courier New">
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; EmployeeId,</font><br>
<font face="Courier New">
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; CONVERT(VARCHAR(MAX), EmployeeId) AS thePath,</font><br>
<font face="Courier New">
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; 1 AS Level</font><br>
<font face="Courier New">
&nbsp;&nbsp;&nbsp; FROM HumanResources.Employee</font><br>
<font face="Courier New">
&nbsp;&nbsp;&nbsp; WHERE ManagerId IS NULL</font><br>
<font face="Courier New">
</font><br>
<font face="Courier New">
&nbsp;&nbsp;&nbsp; UNION ALL</font><br>
<font face="Courier New">
</font><br>
<font face="Courier New">
&nbsp;&nbsp;&nbsp; SELECT</font><br>
<font face="Courier New">
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; e.EmployeeId,</font><br>
<font face="Courier New">
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; x.thePath + '.' + CONVERT(VARCHAR(MAX), e.EmployeeId) AS thePath,</font><br>
<font face="Courier New">
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; x.Level + 1 AS Level</font><br>
<font face="Courier New">
&nbsp;&nbsp;&nbsp; FROM EmployeeLevels x</font><br>
<font face="Courier New">
&nbsp;&nbsp;&nbsp; JOIN HumanResources.Employee e on e.ManagerId = x.EmployeeId</font><br>
<font face="Courier New">
)</font><br>
</blockquote>
thePath is the materialized path, a '.' delimited breadcrumb from the
root to each node.&nbsp; We also end up with a level, which represents
how many nodes away from the root in the hierarchy each employee sits
(very important for those upwardly-mobile junior execs, no doubt!)<br>
<br>
I'm going to assume that readers of this post are familiar with the <a href="http://www.intelligententerprise.com/001020/celko.jhtml?_requestid=235427">Nested Sets Model</a>, as popularized by Joe Celko.&nbsp; If not, read the link.<br>
<br>
After staring at the Nested Sets for a while, I arrived at the following mathematical properties:<br>
<br>
<ul><li>Value of Lft for the root node is 1</li><li>Value of Rgt for the root node is 2 * (Number of nodes)</li><li>Value of Lft for any node is ((Number of nodes visited) * 2) - (Level of current node)</li><li>Value of Rgt for any node is (Lft value) + ((Number of subnodes) * 2) + 1</li></ul>
I think the only factor here that requires further explanation is
"(Number of nodes visited)".&nbsp; By this, I mean the number of
nodes that would have been visited (including the current node) if one
were doing a
preorder traversal of the tree. Luckily, this number is quite easy
to determine.&nbsp; The row number for each row, as determined by
ordering by the materialized path, <i>is</i>
this number.&nbsp; I
encourage readers to validate this with pencil and paper if it doesn't
quite make sense reading on the screen.&nbsp; Draw a simple tree and
traverse, starting from the lefthand side of the root node.<br>
<br>
But how to translate that into T-SQL?&nbsp; Luckily, SQL Server 2005, in addition to CTEs, also includes the very useful <a href="http://blogs.conchango.com/jamiethomson/archive/2005/02/16/1025.aspx">ROW_NUMBER() function</a>.&nbsp;
So to get the row number, we simply need to add another CTE into the
chain (did you know that successive CTEs can use the results of
previous CTEs?):<br>
<br>
<blockquote><font face="Courier New">WITH EmployeeLevels AS</font><br>
<font face="Courier New">(</font><br>
<font face="Courier New">&nbsp;&nbsp;&nbsp; SELECT</font><br>
<font face="Courier New">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; EmployeeId,</font><br>
<font face="Courier New">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; CONVERT(VARCHAR(MAX), EmployeeId) AS thePath,</font><br>
<font face="Courier New">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; 1 AS Level</font><br>
<font face="Courier New">&nbsp;&nbsp;&nbsp; FROM HumanResources.Employee</font><br>
<font face="Courier New">&nbsp;&nbsp;&nbsp; WHERE ManagerId IS NULL</font><br>
<br>
<font face="Courier New">&nbsp;&nbsp;&nbsp; UNION ALL</font><br>
<br>
<font face="Courier New">&nbsp;&nbsp;&nbsp; SELECT</font><br>
<font face="Courier New">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; e.EmployeeId,</font><br>
<font face="Courier New">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; x.thePath + '.' + CONVERT(VARCHAR(MAX), e.EmployeeId) AS thePath,</font><br>
<font face="Courier New">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; x.Level + 1 AS Level</font><br>
<font face="Courier New">&nbsp;&nbsp;&nbsp; FROM EmployeeLevels x</font><br>
<font face="Courier New">&nbsp;&nbsp;&nbsp; JOIN HumanResources.Employee e on e.ManagerId = x.EmployeeId</font><br>
<font face="Courier New">),</font><br>
<font face="Courier New">EmployeeRows AS</font><br>
<font face="Courier New">(</font><br>
<font face="Courier New">&nbsp;&nbsp;&nbsp; SELECT</font><br>
<font face="Courier New">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;EmployeeLevels.*,</font><br>
<font face="Courier New">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;ROW_NUMBER() OVER (ORDER BY thePath) AS Row</font><br>
<font face="Courier New">&nbsp;&nbsp;&nbsp; FROM EmployeeLevels</font><br>
<font face="Courier New">)<br>
</font></blockquote>
We now have current level and number of nodes visited, which gives us
the Lft value for each node.&nbsp; But how to determine the number of
subnodes, in order to get the Rgt value?&nbsp; Luckily, the materialized
path also gives us that capability...<br>
<br>
For any given node, number of subnodes can be determined by counting
all nodes whose path value is LIKE the current path value + '.%'.<br>
<br>
The resultant query should be fairly obvious at this point:<br>
<br>
<blockquote><font face="Courier New">WITH EmployeeLevels AS<br>
(<br>
&nbsp;&nbsp;&nbsp; SELECT<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; EmployeeId,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; CONVERT(VARCHAR(MAX), EmployeeId) AS thePath,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; 1 AS Level<br>
&nbsp;&nbsp;&nbsp; FROM HumanResources.Employee<br>
&nbsp;&nbsp;&nbsp; WHERE ManagerId IS NULL<br>
<br>
&nbsp;&nbsp;&nbsp; UNION ALL<br>
<br>
&nbsp;&nbsp;&nbsp; SELECT<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; e.EmployeeId,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; x.thePath + '.' + CONVERT(VARCHAR(MAX), e.EmployeeId) AS thePath,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; x.Level + 1 AS Level<br>
&nbsp;&nbsp;&nbsp; FROM EmployeeLevels x<br>
&nbsp;&nbsp;&nbsp; JOIN HumanResources.Employee e on e.ManagerId = x.EmployeeId<br>
),<br>
EmployeeRows AS<br>
(<br>
&nbsp;&nbsp;&nbsp; SELECT<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;EmployeeLevels.*,<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;ROW_NUMBER() OVER (ORDER BY thePath) AS Row<br>
&nbsp;&nbsp;&nbsp; FROM EmployeeLevels<br>
)<br>
SELECT<br>
&nbsp;&nbsp;&nbsp; &nbsp;ER.EmployeeId,<br>
&nbsp;&nbsp;&nbsp; &nbsp;ER.thePath,<br>
&nbsp;&nbsp;&nbsp; &nbsp;ER.Level,<br>
&nbsp;&nbsp;&nbsp; &nbsp;ER.Row,<br>
&nbsp;&nbsp;&nbsp; &nbsp;(ER.Row * 2) - ER.Level AS Lft,<br>
&nbsp;&nbsp;&nbsp; &nbsp;((ER.Row * 2) - ER.Level) + <br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; (<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; SELECT COUNT(*) * 2<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; FROM EmployeeRows ER2 <br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; WHERE ER2.thePath LIKE ER.thePath + '.%'<br>
&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; ) + 1 AS Rgt<br>
FROM EmployeeRows ER<br>
ORDER BY thePath<br>
</font></blockquote>
... And that's it!&nbsp; We have now converted the Adjacency List tree into a Nested Sets tree.<br>
<br>
In the next installment, I will show how to determine the Nested
Intervals encoding from an Adjacency List tree, also using recursive
CTEs.<br>
<br>
Questions?<br>
<br><br><img src="http://www2.sqlblog.com/aggbug.aspx?PostID=103" width="1" height="1">HierarchiesT-SQL