If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Tutorial: SQL Server recursion - Parent child relations

Update on April 2. 2008: This tutorial was written for SQL Server 2000. Recursive queries can be done much easier in SQL Server 2005 using recursive Common Table Expression.

I'm sure that many of the experienced database developers in here already know about this technique, but for the rest of you I though it would be nice to have a tutorial on how you can query multi-level parent-child data, both one to many and many to many relationsships.

Examples of a one to many multi-level structure are for example organisations structures or a threaded discussion forum (like VBForums).

A many to many structure can be a Bill of Material (BOM). A BOM can have many items and an item can be part of many BOMs.

The problem with these kind of relations is how write a query that will search and return the whole tree for a given record, e.g. the whole subtree (or x levels) of a BOM or all replies to a post in a discussion forum.

I will start with an example where we look at an employee structure. An employee has one manager, and a manager can have many employees. We create a table Employee with Employee_Id, First_Name, Last_Name, and Manager_Id.

We have now the created the Employee table and populated it with test data. As you can see, the Employee structure consists of four levels.

The next step is to create a function or stored procedure that can take an Employee_Id as parameter and list all sub-levels in the employee hierarcy. This means that if we execute the query with Employee_Id=1 (si_the_geek), we will get all the records in the table returned.

The script that does the magic is here. Comments in the script explains how it works.

Code:

declare @tree table(TreeId int identity primary key,
Employee_Id int not null,
lvl int, ParentTreeId int,
Path varchar(2000),
Employee_Id_Path varchar(2000))
declare @rowcount int, @lvl int, @delcount int
set nocount on
set @lvl=0
-- Here we insert the top level that will be used to extract all sub leves from.
insert into @tree(Employee_Id,lvl) values (1,@lvl)
-- Get numbers of rows affected by the initial insert
set @rowcount=@@ROWCOUNT
-- Update the first level's Path and Employee_Id_Path. Employee_Id_Path will be used to prevent the recursion from
-- going into an endless loop if for example si_the_geek is defined as both parent and child to mendhak.
update @tree set Path=str(TreeId,10,0) + '.', Employee_Id_Path=cast(Employee_Id as varchar(10)) + '\'
while @rowcount>0
begin
-- Increase level by one.
set @lvl=@lvl+1
-- This line inserts all records from the Employee table that has the previous level's
-- employee_Id as Manager_Id. In other words, it inserts all manager's staff from the previous
-- execution of the loop.
insert into @tree(Employee_Id, lvl, ParentTreeId)
select e.Employee_Id, @lvl, t.TreeId
from Employee e inner join @tree t on e.Manager_Id=t.Employee_Id and t.lvl=@lvl-1
-- Get number of rows affected by the previous insert. If no records were affected by the last
-- statement, then we know that we have reached the bottom of the hierarcy.
set @rowcount=@@ROWCOUNT
-- The following SQL updates the newly inserted records' Path with the
-- the new TreeId + previous levels TreeId. The path is used later to sort
-- the result in a treeview look-a-like way. We also update Employee_Id_Path
-- with the Employee_Id and previous level's Employee_Id_Path. The Employee_Id_Path
-- is used to detect endless loops and therefore we only update Employee_Id_Path
-- where parent Employee_Id_Path does not contain current Employee_Id. This is
-- perhaps a little bit difficult to understand the first time you read the code.
update t1 set t1.Path=t2.Path + str(t1.TreeId,10,0)+'.',
t1.Employee_Id_Path=t2.Employee_Id_Path + cast(t1.Employee_Id as varchar(10))+'\'
from @tree t1 inner join @tree t2 on t1.ParentTreeId=t2.TreeId
where t1.lvl=@lvl and t2.Employee_Id_Path not like '%' + cast(t1.Employee_Id as varchar(10)) + '\%'
-- This statement deletes andy records where Employee_Id_Path is NULL. In other
-- words: We delete records that will cause an endless loop.
delete from @tree where Employee_Id_Path is null
-- We then get the number of records deleted...
set @delcount=@@ROWCOUNT
-- ... and substract this number from the records appended. If @rowcount=0
-- then the WHILE loop will exit.
set @rowcount=@rowcount-@delcount
end
-- Now we can choose to display the results in what way we want to. Path can, as we said before,
-- be used to sort the result in a treeview way.
select replicate(' | ',lvl)+cast(t.Employee_Id as varchar(10)) as tree_level,e.First_Name,lvl
from @tree t inner join Employee e on t.Employee_Id=e.Employee_Id order by Path
GO

You can use this script in a stored procedure or a UDF. Using the script in a UDF will also give you the possibility to use the result as a recordset in other SELECT statements. Read here to see how the UDF header must be defined to act as a recordset.

Execute the script in QA and you will get the following result. As you can see, this is a treeview representation of the hierarcy displayed in the picture above.

This example demonstrates the basic principle, but you can make it much more advanced if you like to. We have used it to return a distinct list of all items in a BOM and the quantity of each item.

Re: Tutorial: Nested iterations through parent-child relations

As an addition, if anyone needs to implement tree structures in a database, I highly suggest taking a look at a book "Joe Celko's Trees and Hierarchies in SQL for Smarties". It covers path enumeration which this example makes use of, and many other implementations, their pros and cons, etc.

Re: Tutorial: SQL Server recursion - Parent child relations

Hi!

Thanks for an outstanding tutorial. However, may I ask a follow up question?
I see a problem in the database design, that is if a parent have many childs, there will be a lot of redundant information stored in this one table. The best(?) way to solve that is to make a second table like this

EMPLOYEE_MANAGER

EmployeeId ManagerId
-----------------------
1 3
2 3
4 3

Then you dont have to store all the employee metadata all over again, all you really need are the key(s) anyway.

Would you perhaps consider writing a tutorial or just post some t-sql that explains how to read this information from a table structure like the one above? I have spent a hour or so in pl/sql trying to make it work, but so far no success. Perhaps you can try in T-SQL?

Re: Tutorial: SQL Server recursion - Parent child relations

This is extremely good code. I liked it. I found a small bug in your code

When you compare t2.Employee_Id_Path not like '%' + cast(t1.Employee_Id as varchar(10)) + '\%', it was comparing "2" with "192\" and it was failing to include child "2". I have just added a white space like Employee_Id_Path=' ' + cast(Employee_Id as varchar(10)) + '\' and t2.Employee_Id_Path not like '% ' + cast(t1.Employee_Id as varchar(10)) + '\%'

Sivakumar

Originally Posted by kaffenils

Update on April 2. 2008: This tutorial was written for SQL Server 2000. Recursive queries can be done much easier in SQL Server 2005 using recursive Common Table Expression.

I'm sure that many of the experienced database developers in here already know about this technique, but for the rest of you I though it would be nice to have a tutorial on how you can query multi-level parent-child data, both one to many and many to many relationsships.

Examples of a one to many multi-level structure are for example organisations structures or a threaded discussion forum (like VBForums).

A many to many structure can be a Bill of Material (BOM). A BOM can have many items and an item can be part of many BOMs.

The problem with these kind of relations is how write a query that will search and return the whole tree for a given record, e.g. the whole subtree (or x levels) of a BOM or all replies to a post in a discussion forum.

I will start with an example where we look at an employee structure. An employee has one manager, and a manager can have many employees. We create a table Employee with Employee_Id, First_Name, Last_Name, and Manager_Id.

We have now the created the Employee table and populated it with test data. As you can see, the Employee structure consists of four levels.

The next step is to create a function or stored procedure that can take an Employee_Id as parameter and list all sub-levels in the employee hierarcy. This means that if we execute the query with Employee_Id=1 (si_the_geek), we will get all the records in the table returned.

The script that does the magic is here. Comments in the script explains how it works.

Code:

declare @tree table(TreeId int identity primary key,
Employee_Id int not null,
lvl int, ParentTreeId int,
Path varchar(2000),
Employee_Id_Path varchar(2000))
declare @rowcount int, @lvl int, @delcount int
set nocount on
set @lvl=0
-- Here we insert the top level that will be used to extract all sub leves from.
insert into @tree(Employee_Id,lvl) values (1,@lvl)
-- Get numbers of rows affected by the initial insert
set @rowcount=@@ROWCOUNT
-- Update the first level's Path and Employee_Id_Path. Employee_Id_Path will be used to prevent the recursion from
-- going into an endless loop if for example si_the_geek is defined as both parent and child to mendhak.
update @tree set Path=str(TreeId,10,0) + '.', Employee_Id_Path=cast(Employee_Id as varchar(10)) + '\'
while @rowcount>0
begin
-- Increase level by one.
set @lvl=@lvl+1
-- This line inserts all records from the Employee table that has the previous level's
-- employee_Id as Manager_Id. In other words, it inserts all manager's staff from the previous
-- execution of the loop.
insert into @tree(Employee_Id, lvl, ParentTreeId)
select e.Employee_Id, @lvl, t.TreeId
from Employee e inner join @tree t on e.Manager_Id=t.Employee_Id and t.lvl=@lvl-1
-- Get number of rows affected by the previous insert. If no records were affected by the last
-- statement, then we know that we have reached the bottom of the hierarcy.
set @rowcount=@@ROWCOUNT
-- The following SQL updates the newly inserted records' Path with the
-- the new TreeId + previous levels TreeId. The path is used later to sort
-- the result in a treeview look-a-like way. We also update Employee_Id_Path
-- with the Employee_Id and previous level's Employee_Id_Path. The Employee_Id_Path
-- is used to detect endless loops and therefore we only update Employee_Id_Path
-- where parent Employee_Id_Path does not contain current Employee_Id. This is
-- perhaps a little bit difficult to understand the first time you read the code.
update t1 set t1.Path=t2.Path + str(t1.TreeId,10,0)+'.',
t1.Employee_Id_Path=t2.Employee_Id_Path + cast(t1.Employee_Id as varchar(10))+'\'
from @tree t1 inner join @tree t2 on t1.ParentTreeId=t2.TreeId
where t1.lvl=@lvl and t2.Employee_Id_Path not like '%' + cast(t1.Employee_Id as varchar(10)) + '\%'
-- This statement deletes andy records where Employee_Id_Path is NULL. In other
-- words: We delete records that will cause an endless loop.
delete from @tree where Employee_Id_Path is null
-- We then get the number of records deleted...
set @delcount=@@ROWCOUNT
-- ... and substract this number from the records appended. If @rowcount=0
-- then the WHILE loop will exit.
set @rowcount=@rowcount-@delcount
end
-- Now we can choose to display the results in what way we want to. Path can, as we said before,
-- be used to sort the result in a treeview way.
select replicate(' | ',lvl)+cast(t.Employee_Id as varchar(10)) as tree_level,e.First_Name,lvl
from @tree t inner join Employee e on t.Employee_Id=e.Employee_Id order by Path
GO

You can use this script in a stored procedure or a UDF. Using the script in a UDF will also give you the possibility to use the result as a recordset in other SELECT statements. Read here to see how the UDF header must be defined to act as a recordset.

Execute the script in QA and you will get the following result. As you can see, this is a treeview representation of the hierarcy displayed in the picture above.

This example demonstrates the basic principle, but you can make it much more advanced if you like to. We have used it to return a distinct list of all items in a BOM and the quantity of each item.