All the Perl that's Practical to Extract and Report

Navigation

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Without JavaScript enabled, you might want to
use the classic discussion system instead. If you login, you can remember this preference.

Please Log In to Continue

I've been doing a bunch of hierarchy work recently in SQL, and the hands-down best way to do the type of typical hierarchy queries that I do in SQL is to store (or at least additionally represent) hiearchies as sets.

See, SQL is a set-operation language. So having hierarchies stored as an adjacency list (e.g. employee table has a 'manager' column) -- although sufficient -- forces you out of SQL and into a procedural language for tree-walking.

Imagine you want to get "this person's management chain" or "everyone in that person's organization" or "all the managers". But you don't want to just get that, you want to mix in all the other cool SQL predicates like "AND pay > 50000 AND state = 'CO'".

Joe Celko describes an excellent approach to this in his _SQL for Smarties_ book. To set this up for the first time, you need to do a recursive pass through the tree -- but you only need to do this once. From then on you can express very powerful hierarchy relationships in SQL very naturally.

Essentially you do a depth-first traversal of the tree, starting at the root, and assigning "left" and "right" numbers along the way from a single monotonically increasing counter. So the root node gets left=1. Then you recurse down to the manager's first subordinate, assign left=2, and recurse down to her subordinates, if any. If at any point a node has no subordinates OR you've already walked them all, assign that node's 'right' to be the next number. That means that the final number will be assigned to the root node's 'right' attribute.

Here's the cool part: Once you're done, here's some examples of what you can do:

(jnoble's left=233 and right=534 for these example)

My managment chain (my boss, bosses' boss, etc. up to the root):

SELECT id FROM employees WHERE left = 233 and right > 534;

While others will have left values less than mine, and right values greater than mine, only my management chain will have both.

My entire organization (me and everyone 'below' me):

SELECT id FROM employees WHERE left >= 233 AND right = 233 AND right = 534 AND state = 'CO';

All company managers (people with other people reporting to them) who are on leave:

SELECT id FROM employees WHERE right = (left+1) AND status = 'Leave';

(Leaf nodes have 'right' values that are one more than their 'left' value -- anyone else must be a manager.)

Combine together with your basic logic statements to grab groups comprised of multiple sub-hierarchies, subtracting out other sub-hierarchies, joining to other tables, etc. etc. It's a very "native" way to work with hierarchies in SQL.

In practice, I keep the adjacency list ('manager' attribute) as my source of record, since that's how HR gives it to me. Plus it's useful to get immediate manager or immediate direct reports. Each time I reload the HR data into my tables, I do a quick traversal in perl to add 'left' and 'right' columns. Then I can do some pretty darn cool hierarchy stuff right in SQL without having to pop into a procedural language each time.

Oh, and: changes within the hierarchy are supported well by this model. See Celko's book for details, among other places. For my purposes, since I'm just a consumer of the hierarchy data that's kept up-to-date on a separate system, I haven't had to bother. But it's nice to know that I could.