WITH provides a way to write
auxiliary statements for use in a larger query. These statements,
which are often referred to as Common Table Expressions or
CTEs, can be thought of as
defining temporary tables that exist just for one query. Each
auxiliary statement in a WITH clause can
be a SELECT, INSERT, UPDATE, or
DELETE; and the WITH clause itself is attached to a primary
statement that can also be a SELECT,
INSERT, UPDATE,
or DELETE.

The basic value of SELECT in
WITH is to break down complicated
queries into simpler parts. An example is:

WITH regional_sales AS (
SELECT region, SUM(amount) AS total_sales
FROM orders
GROUP BY region
), top_regions AS (
SELECT region
FROM regional_sales
WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales)
)
SELECT region,
product,
SUM(quantity) AS product_units,
SUM(amount) AS product_sales
FROM orders
WHERE region IN (SELECT region FROM top_regions)
GROUP BY region, product;

which displays per-product sales totals in only the top
sales regions. The WITH clause defines
two auxiliary statements named regional_sales and top_regions, where the output of regional_sales is used in top_regions and the output of top_regions is used in the primary SELECT query. This example could have been
written without WITH, but we'd have
needed two levels of nested sub-SELECTs. It's a bit easier to follow this
way.

The optional RECURSIVE modifier
changes WITH from a mere syntactic
convenience into a feature that accomplishes things not
otherwise possible in standard SQL. Using RECURSIVE, a WITH query
can refer to its own output. A very simple example is this
query to sum the integers from 1 through 100:

The general form of a recursive WITH query is always a non-recursive term, then UNION (or UNION ALL),
then a recursive term, where only the
recursive term can contain a reference to the query's own
output. Such a query is executed as follows:

Recursive Query Evaluation

Evaluate the non-recursive term. For UNION (but not UNION
ALL), discard duplicate rows. Include all remaining
rows in the result of the recursive query, and also place
them in a temporary working
table.

So long as the working table is not empty, repeat
these steps:

Evaluate the recursive term, substituting the
current contents of the working table for the
recursive self-reference. For UNION (but not UNION ALL), discard duplicate rows and
rows that duplicate any previous result row. Include
all remaining rows in the result of the recursive
query, and also place them in a temporary intermediate table.

Replace the contents of the working table with the
contents of the intermediate table, then empty the
intermediate table.

Note: Strictly speaking, this process is
iteration not recursion, but RECURSIVE is the terminology chosen by the
SQL standards committee.

In the example above, the working table has just a single
row in each step, and it takes on the values from 1 through 100
in successive steps. In the 100th step, there is no output
because of the WHERE clause, and so
the query terminates.

Recursive queries are typically used to deal with
hierarchical or tree-structured data. A useful example is this
query to find all the direct and indirect sub-parts of a
product, given only a table that shows immediate
inclusions:

When working with recursive queries it is important to be
sure that the recursive part of the query will eventually
return no tuples, or else the query will loop indefinitely.
Sometimes, using UNION instead of
UNION ALL can accomplish this by
discarding rows that duplicate previous output rows. However,
often a cycle does not involve output rows that are completely
duplicate: it may be necessary to check just one or a few
fields to see if the same point has been reached before. The
standard method for handling such situations is to compute an
array of the already-visited values. For example, consider the
following query that searches a table graph using a link field:

This query will loop if the link relationships contain cycles. Because
we require a "depth" output, just
changing UNION ALL to UNION would not eliminate the looping. Instead
we need to recognize whether we have reached the same row again
while following a particular path of links. We add two columns
path and cycle to the loop-prone query:

Tip: Omit the ROW()
syntax in the common case where only one field needs to be
checked to recognize a cycle. This allows a simple array
rather than a composite-type array to be used, gaining
efficiency.

Tip: The recursive query evaluation algorithm
produces its output in breadth-first search order. You can
display the results in depth-first search order by making
the outer query ORDER BY a
"path" column constructed in
this way.

A helpful trick for testing queries when you are not certain
if they might loop is to place a LIMIT
in the parent query. For example, this query would loop forever
without the LIMIT:

This works because PostgreSQL's implementation evaluates only
as many rows of a WITH query as are
actually fetched by the parent query. Using this trick in
production is not recommended, because other systems might work
differently. Also, it usually won't work if you make the outer
query sort the recursive query's results or join them to some
other table, because in such cases the outer query will usually
try to fetch all of the WITH query's
output anyway.

A useful property of WITH queries
is that they are evaluated only once per execution of the
parent query, even if they are referred to more than once by
the parent query or sibling WITH
queries. Thus, expensive calculations that are needed in
multiple places can be placed within a WITH query to avoid redundant work. Another
possible application is to prevent unwanted multiple
evaluations of functions with side-effects. However, the other
side of this coin is that the optimizer is less able to push
restrictions from the parent query down into a WITH query than an ordinary sub-query. The
WITH query will generally be evaluated
as written, without suppression of rows that the parent query
might discard afterwards. (But, as mentioned above, evaluation
might stop early if the reference(s) to the query demand only a
limited number of rows.)

The examples above only show WITH
being used with SELECT, but it can be
attached in the same way to INSERT,
UPDATE, or DELETE. In each case it effectively provides
temporary table(s) that can be referred to in the main
command.

This query effectively moves rows from products to products_log. The DELETE in WITH deletes
the specified rows from products,
returning their contents by means of its RETURNING clause; and then the primary query
reads that output and inserts it into products_log.

A fine point of the above example is that the WITH clause is attached to the INSERT, not the sub-SELECT within the INSERT. This is necessary because data-modifying
statements are only allowed in WITH
clauses that are attached to the top-level statement. However,
normal WITH visibility rules apply, so
it is possible to refer to the WITH
statement's output from the sub-SELECT.

Data-modifying statements in WITH
usually have RETURNING clauses, as
seen in the example above. It is the output of the RETURNING clause, not the target table of the
data-modifying statement, that forms the temporary table that
can be referred to by the rest of the query. If a
data-modifying statement in WITH lacks
a RETURNING clause, then it forms no
temporary table and cannot be referred to in the rest of the
query. Such a statement will be executed nonetheless. A
not-particularly-useful example is:

WITH t AS (
DELETE FROM foo
)
DELETE FROM bar;

This example would remove all rows from tables foo and bar. The
number of affected rows reported to the client would only
include rows removed from bar.

Recursive self-references in data-modifying statements are
not allowed. In some cases it is possible to work around this
limitation by referring to the output of a recursive WITH, for example:

WITH RECURSIVE included_parts(sub_part, part) AS (
SELECT sub_part, part FROM parts WHERE part = 'our_product'
UNION ALL
SELECT p.sub_part, p.part
FROM included_parts pr, parts p
WHERE p.part = pr.sub_part
)
DELETE FROM parts
WHERE part IN (SELECT part FROM included_parts);

This query would remove all direct and indirect subparts of
a product.

Data-modifying statements in WITH
are executed exactly once, and always to completion,
independently of whether the primary query reads all (or indeed
any) of their output. Notice that this is different from the
rule for SELECT in WITH: as stated in the previous section,
execution of a SELECT is carried only
as far as the primary query demands its output.

The sub-statements in WITH are
executed concurrently with each other and with the main query.
Therefore, when using data-modifying statements in WITH, the order in which the specified updates
actually happen is unpredictable. All the statements are
executed with the same snapshot (see
Chapter 13), so they cannot
"see" one another's effects on the
target tables. This alleviates the effects of the
unpredictability of the actual order of row updates, and means
that RETURNING data is the only way to
communicate changes between different WITH sub-statements and the main query. An
example of this is that in

Trying to update the same row twice in a single statement is
not supported. Only one of the modifications takes place, but
it is not easy (and sometimes not possible) to reliably predict
which one. This also applies to deleting a row that was already
updated in the same statement: only the update is performed.
Therefore you should generally avoid trying to modify a single
row twice in a single statement. In particular avoid writing
WITH sub-statements that could affect
the same rows changed by the main statement or a sibling
sub-statement. The effects of such a statement will not be
predictable.

At present, any table used as the target of a data-modifying
statement in WITH must not have a
conditional rule, nor an ALSO rule,
nor an INSTEAD rule that expands to
multiple statements.