Java, XMPP, Space and pretty much everything else

Category Archives: PostgreSQL

At times you need to be able gather information into Sets where an entry can contain one or more entries or sets, or Nodes where an entry has child nodes each of whom can contain their own children. For me it’s usually a geographical tree (Continent, Country, County, Town) or permission roles (Administrator, Moderator, User). Although this is documented elsewhere on the net, here’s how I do this in PostgreSQL utilising pl/pgsql functions.

The tree of nodes consists of one entry in the table per node, with the root node at the top of the tree. Selects on the table are fast, but updates are expensive. As the data contained in the tree rarely changes, this is perfectly fine for most purposes.

For the first example, I’m going to use a geographical tree, so we will have at the end of the exercise the following tree:

First we need a table to store the data. The following SQL Schema defines the node table and inserts the root. Apart from this instance, you should only use the pl/pgsql functions to modify this table:

CREATE SEQUENCE nodeseq;

CREATE TABLE node (

nodeid BIGINT NOT NULL,

l BIGINT NOT NULL,

r BIGINT NOT NULL,

d INTEGER NOT NULL, — the depth of this node, 0 = root

n name NOT NULL,

PRIMARY KEY (nodeid)

);

CREATE INDEX node_l ON node(l);

CREATE INDEX node_r ON node(r);

CREATE INDEX node_lr ON node(l,r);

CREATE INDEX node_lrd ON node(l,r,d);

CREATE INDEX node_ln ON node(l,n);

CREATE INDEX node_rn ON node(r,n);

CREATE INDEX node_lrn ON node(l,r,n);

CREATE INDEX node_lrdn ON node(l,r,d,n);

– The root node (0) preinitialised for the structure to work.

INSERT INTO node (nodeid,l,r,d,n) VALUES (0,1,2,0,’root’);

Here we have the table initialised with the root node. At first it has the nodeid of 0, l=1, r=2 and d=0. Over time the l and r values will change as nodes are added to the tree.

Now for the first set of pl/pgsql functions. This set of functions handle retrieval of nodes from the tree based on certain criteria. I’ll only show the function definitions for now as I’ll show how to use these later in this article.

This is probably the most important function as it adds a new node as a child to another and as such is the primary means of adding nodes to the tree.

CREATE OR REPLACE FUNCTION node_addchild( parentid BIGINT, name NAME )

RETURNS node

AS $$

DECLARE

parentnode;

child node;

tmp TEXT[];

BEGIN

– Validate name ensuring it contains no index

SELECT regexp_matches( name, E'([^\\]]+)(\\[([^\\[]+)\\]){0,1}’ )

INTO tmp;

IF tmp[2] IS NOT NULL THEN

RAISE EXCEPTION ‘Name % is invalid, it may not contain an index’, name;

END IF;

parent := node_get( parentid );

/* We allow same named children, but this would disallow it

SELECT *

INTO child

FROM node

WHERE l > parent.l AND r < parent.r AND n = name;

IF FOUND THEN

RAISE EXCEPTION ‘The child % already exists under %’, name, parentid;

END IF;

*/

UPDATE node

SET r=r+2

WHERE r >= parent.r;

UPDATE node

SET l=l+2

WHERE l >= parent.r;

INSERT INTO node (nodeid,l,r,d,n)

VALUES (nextval(‘nodeseq’), parent.r, parent.r+1, parent.d+1, name);

child := node_get( currval(‘nodeseq’) );

RETURN child;

END;

$$ LANGUAGE ‘plpgsql';

Now using our example tree, the first thing we need to do is to add Europe to the tree as a child of the root. Now the root always has nodeid 0 so we call node_addchild( 0, ‘Europe’ ) to add a new node representing Europe:

test=# select * from node;

nodeid | l | r | d |n

——–+—+—+—+——

0 | 1 | 2 | 0 | root

(1 row)

test=# select * from node_addchild(0,’Europe’);

nodeid | l | r | d | n

——–+—+—+—+——–

1 | 2 | 3 | 1 | Europe

(1 row)

test=# select * from node;

nodeid | l | r | d | n

——–+—+—+—+——–

0 | 1 | 4 | 0 | root

1 | 2 | 3 | 1 | Europe

(2 rows)

Here you’ll notice that after the call to node_addchild() it returned the new node (nodeid 1) and updated the root node so it’s l and r fields bound those of it’s children.

We’ll now add the other nodes except the towns:

test=# select * from node_addchild(1,’United Kingdom’);

nodeid | l | r | d | n

——–+—+—+—+—————-

2 | 3 | 4 | 2 | United Kingdom

(1 row)

test=# select * from node_addchild(2,’Kent’);

nodeid | l | r | d |n

——–+—+—+—+——

3 | 4 | 5 | 3 | Kent

(1 row)

test=# select * from node_addchild(2,’Sussex’);

nodeid | l | r | d | n

——–+—+—+—+——–

4 | 6 | 7 | 3 | Sussex

(1 row)

Adding a node by it’s path

Apart from node_addchild we can also add a node by it’s path. For Canterbury it’s path from Europe is “United Kingdom/Kent/Canterbury” so instead of having to find Kent, we can use the node_addchildbypath function:

When handling sets, you can simply use the following to return all entries contained by a node – here we will ask for all nodes within the United Kingdom:

test=# select * from node_get(2);

nodeid | l | r| d | n

——–+—+—-+—+—————-

2 | 3 | 16 | 2 | United Kingdom

(1 row)

test=# select * from node where l > 3 and r < 16;

nodeid | l| r| d | n

——–+—-+—-+—+————

6 |7 |8 | 4 | Maidstone

3 |4 | 11 | 3 | Kent

7 |9 | 10 | 4 | Tonbridge

4 | 12 | 15 | 3 | Sussex

8 | 13 | 14 | 4 | Arundel

5 |5 |6 | 4 | Canterbury

(6 rows)

Now this is fine for sets, but when dealing with a tree we usually need to know those nodes immediately beneath the node, in this case the counties. This is where the node_children function is handy:

test=# select * from node_children(2);

nodeid | l| r| d | n

——–+—-+—-+—+——–

3 |4 | 11 | 3 | Kent

4 | 12 | 15 | 3 | Sussex

(2 rows)

Here’s the definition for node_children – here you’ll notice it’s similar to retrieving the contained entries but it now uses the d column:

CREATE OR REPLACE FUNCTION node_children( parentid BIGINT )

RETURNS SETOF node

AS $$

DECLARE

parent node;

r node%rowtype;

BEGIN

parent := node_get( parentid );

FOR r

IN SELECT *

FROM node

WHERE l BETWEEN parent.l AND parent.r

AND d = parent.d+1

ORDER BY n, nodeid

LOOP

RETURN NEXT r;

END LOOP;

RETURN;

END;

$$ LANGUAGE ‘plpgsql';

Deleting nodes

Deleting a node is not as simple as simply deleting the entry from the table. What we have to do is delete the entry including all of it’s children and then close the gap in the l & r columns where the deleted entries used to be. To handle this we have the node_deleteTree function. For this example we’ll add a new County, town then delete them from the tree:

At times you need to be able gather information into Sets where an entry can contain one or more entries or sets, or Nodes where an entry has child nodes each of whom can contain their own children. For me it’s usually a geographical tree (Continent, Country, County, Town) or permission roles (Administrator, Moderator, User). Although this is documented elsewhere on the net, here’s how I do this in PostgreSQL utilising pl/pgsql functions.

The tree of nodes consists of one entry in the table per node, with the root node at the top of the tree. Selects on the table are fast, but updates are expensive. As the data contained in the tree rarely changes, this is perfectly fine for most purposes.

For the first example, I’m going to use a geographical tree, so we will have at the end of the exercise the following tree:

First we need a table to store the data. The following SQL Schema defines the node table and inserts the root. Apart from this instance, you should only use the pl/pgsql functions to modify this table:

CREATE SEQUENCE nodeseq;

CREATE TABLE node (

nodeid BIGINT NOT NULL,

l BIGINT NOT NULL,

r BIGINT NOT NULL,

d INTEGER NOT NULL, — the depth of this node, 0 = root

n name NOT NULL,

PRIMARY KEY (nodeid)

);

CREATE INDEX node_l ON node(l);

CREATE INDEX node_r ON node(r);

CREATE INDEX node_lr ON node(l,r);

CREATE INDEX node_lrd ON node(l,r,d);

CREATE INDEX node_ln ON node(l,n);

CREATE INDEX node_rn ON node(r,n);

CREATE INDEX node_lrn ON node(l,r,n);

CREATE INDEX node_lrdn ON node(l,r,d,n);

– The root node (0) preinitialised for the structure to work.

INSERT INTO node (nodeid,l,r,d,n) VALUES (0,1,2,0,’root’);

Here we have the table initialised with the root node. At first it has the nodeid of 0, l=1, r=2 and d=0. Over time the l and r values will change as nodes are added to the tree.

Now for the first set of pl/pgsql functions. This set of functions handle retrieval of nodes from the tree based on certain criteria. I’ll only show the function definitions for now as I’ll show how to use these later in this article.

This is probably the most important function as it adds a new node as a child to another and as such is the primary means of adding nodes to the tree.

CREATE OR REPLACE FUNCTION node_addchild( parentid BIGINT, name NAME )

RETURNS node

AS $$

DECLARE

parentnode;

child node;

tmp TEXT[];

BEGIN

– Validate name ensuring it contains no index

SELECT regexp_matches( name, E'([^\\]]+)(\\[([^\\[]+)\\]){0,1}’ )

INTO tmp;

IF tmp[2] IS NOT NULL THEN

RAISE EXCEPTION ‘Name % is invalid, it may not contain an index’, name;

END IF;

parent := node_get( parentid );

/* We allow same named children, but this would disallow it

SELECT *

INTO child

FROM node

WHERE l > parent.l AND r < parent.r AND n = name;

IF FOUND THEN

RAISE EXCEPTION ‘The child % already exists under %’, name, parentid;

END IF;

*/

UPDATE node

SET r=r+2

WHERE r >= parent.r;

UPDATE node

SET l=l+2

WHERE l >= parent.r;

INSERT INTO node (nodeid,l,r,d,n)

VALUES (nextval(‘nodeseq’), parent.r, parent.r+1, parent.d+1, name);

child := node_get( currval(‘nodeseq’) );

RETURN child;

END;

$$ LANGUAGE ‘plpgsql';

Now using our example tree, the first thing we need to do is to add Europe to the tree as a child of the root. Now the root always has nodeid 0 so we call node_addchild( 0, ‘Europe’ ) to add a new node representing Europe:

test=# select * from node;

nodeid | l | r | d |n

——–+—+—+—+——

0 | 1 | 2 | 0 | root

(1 row)

test=# select * from node_addchild(0,’Europe’);

nodeid | l | r | d | n

——–+—+—+—+——–

1 | 2 | 3 | 1 | Europe

(1 row)

test=# select * from node;

nodeid | l | r | d | n

——–+—+—+—+——–

0 | 1 | 4 | 0 | root

1 | 2 | 3 | 1 | Europe

(2 rows)

Here you’ll notice that after the call to node_addchild() it returned the new node (nodeid 1) and updated the root node so it’s l and r fields bound those of it’s children.

We’ll now add the other nodes except the towns:

test=# select * from node_addchild(1,’United Kingdom’);

nodeid | l | r | d | n

——–+—+—+—+—————-

2 | 3 | 4 | 2 | United Kingdom

(1 row)

test=# select * from node_addchild(2,’Kent’);

nodeid | l | r | d |n

——–+—+—+—+——

3 | 4 | 5 | 3 | Kent

(1 row)

test=# select * from node_addchild(2,’Sussex’);

nodeid | l | r | d | n

——–+—+—+—+——–

4 | 6 | 7 | 3 | Sussex

(1 row)

Adding a node by it’s path

Apart from node_addchild we can also add a node by it’s path. For Canterbury it’s path from Europe is “United Kingdom/Kent/Canterbury” so instead of having to find Kent, we can use the node_addchildbypath function:

When handling sets, you can simply use the following to return all entries contained by a node – here we will ask for all nodes within the United Kingdom:

test=# select * from node_get(2);

nodeid | l | r| d | n

——–+—+—-+—+—————-

2 | 3 | 16 | 2 | United Kingdom

(1 row)

test=# select * from node where l > 3 and r < 16;

nodeid | l| r| d | n

——–+—-+—-+—+————

6 |7 |8 | 4 | Maidstone

3 |4 | 11 | 3 | Kent

7 |9 | 10 | 4 | Tonbridge

4 | 12 | 15 | 3 | Sussex

8 | 13 | 14 | 4 | Arundel

5 |5 |6 | 4 | Canterbury

(6 rows)

Now this is fine for sets, but when dealing with a tree we usually need to know those nodes immediately beneath the node, in this case the counties. This is where the node_children function is handy:

test=# select * from node_children(2);

nodeid | l| r| d | n

——–+—-+—-+—+——–

3 |4 | 11 | 3 | Kent

4 | 12 | 15 | 3 | Sussex

(2 rows)

Here’s the definition for node_children – here you’ll notice it’s similar to retrieving the contained entries but it now uses the d column:

CREATE OR REPLACE FUNCTION node_children( parentid BIGINT )

RETURNS SETOF node

AS $$

DECLARE

parent node;

r node%rowtype;

BEGIN

parent := node_get( parentid );

FOR r

IN SELECT *

FROM node

WHERE l BETWEEN parent.l AND parent.r

AND d = parent.d+1

ORDER BY n, nodeid

LOOP

RETURN NEXT r;

END LOOP;

RETURN;

END;

$$ LANGUAGE ‘plpgsql';

Deleting nodes

Deleting a node is not as simple as simply deleting the entry from the table. What we have to do is delete the entry including all of it’s children and then close the gap in the l & r columns where the deleted entries used to be. To handle this we have the node_deleteTree function. For this example we’ll add a new County, town then delete them from the tree: