Menu

MySQL 8.0 Labs: [Recursive] Common Table Expressions in MySQL (CTEs)

[Update of April 2017: the features discussed here are now also available in the official release 8.0.1 of MySQL.]

[Note: this is the first post in a series; there is also a second post].

The MySQL development team just published a Labs release of the MySQL Server (available under “MySQL Server 8.0.0 Optimizer”).
A prominent feature of this release, which I developed, is [Recursive] Common Table Expressions, also known as

[recursive] CTE,

[recursive] subquery factoring,

WITH [RECURSIVE] clause.

Three years ago I had shown a way to emulate CTEs in a blog post, but what we have now in MySQL is the real thing, no ersatz!

So this is the first of several blog posts which will tour all details about this new feature.

A derived table is a subquery in the FROM clause, in bold font below:

SELECT … FROM (SELECT …) AS derived_table;

Derived tables exist in MySQL since long. It’s accurate to say that non-recursive CTEs are “improved derived tables”. Here is a first example:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

WITH cte1(txt)AS(SELECT"This "),

cte2(txt)AS(SELECT CONCAT(cte1.txt,"is a ")FROM cte1),

cte3(txt)AS(SELECT"nice query"UNION

SELECT"query that rocks"UNION

SELECT"query"),

cte4(txt)AS(SELECT concat(cte2.txt,cte3.txt)FROM cte2,cte3)

SELECT MAX(txt),MIN(txt)FROM cte4;

+----------------------------+----------------------+

|MAX(txt)|MIN(txt)|

+----------------------------+----------------------+

|Thisisaquery that rocks|Thisisanice query|

+----------------------------+----------------------+

1row inset(0,00sec)

We have a CTE cte4, which is built from other CTEs cte3 and cte2 which itself is built from cte1. As one reader of this post (Vasiliy) commented, the same can be achieved with derived tables:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

SELECT MAX(txt),MIN(txt)

FROM

(

SELECT concat(cte2.txt,cte3.txt)astxt

FROM

(

SELECT CONCAT(cte1.txt,'is a ')astxt

FROM

(

SELECT'This 'astxt

)ascte1

)ascte2,

(

SELECT'nice query'astxt

UNION

SELECT'query that rocks'

UNION

SELECT'query'

)ascte3

)ascte4;

However if you read both queries, the CTE-based one flows more nicely as it defines table after table, “linearly”, and we see the sentences forming as we read from top to bottom, whereas the derived-table-based query looks “tree-like” and “turned inside-out” (see how the first word, “This “, is deeply nested). CTEs help writing a readable query, which is a win for future maintenance.

Due to the limited size of a blog post, I’ll cover non-recursive CTEs more in depth in a next post – I hope the example above has wet your appetite. Today, I’ll rather look at recursive CTEs which, in my humble opinion, are even more innovative to traditional SQL than
non-recursive CTEs are.

A recursive CTE is a set of rows which is built iteratively: from an initial set of rows, a process derives new rows, which grow the set, and those new rows are fed into the process again, producing more rows, and so on, until the process produces no more rows.

the CTE’s definition is included as prefix of SELECT statement SELECT * FROM my_cte

the initial set is SELECT 1 AS n so it’s a row containing “1”. The meta-data of this row is used to define the column of my_cte: therefore, my_cte has one column of type INT (the type of “1”), named n due to AS n.

the process which takes rows in input and produces new rows in output, is described by SELECT 1+n FROM my_cte WHERE n<10
which means: take rows in my_cte which match n<10, and for each of them,
produce a row with an incremented n.

the definition of my_cte is clearly recursive, as it contains a reference to my_cte (in the FROM clause of the second SELECT). That’s how you can distinguish a recursive CTE from a non-recursive one.

So MySQL will take these steps:

iteration 0: create the initial set of rows, noted S0: S0={1},

iteration 1: run process on result of iteration 0 (on S0): this produces new set S1={1+1}={2},

iteration 2: run process on result of iteration 1 (on S1): this produces S2={1+2}={3},

and so on,

iteration 9: run process on result of iteration 8 (on S8): produces S9={1+9}={10},

iteration 10: run process on result of iteration 9 (on S9): no row of S9 matches n<10, so produce nothing, and that triggers the termination of the loop,

the final result in my_cte, which the outer SELECT will see, is the union of S0, S1, … and S9: {1,2,…,9,10}.

The initial set of rows is sometimes referred to as “the non-recursive SELECT”, “the anchor SELECT” or “the seed SELECT”. The SELECT describing the row production process is “the recursive SELECT”; it is “recursive” because it reads my_cte.

Here is a more complete syntax:

1

2

3

4

5

6

7

8

9

10

11

12

13

WITH RECURSIVE cte_name[list of column names]AS

(

SELECT...<--specifies initial set

UNION ALL

SELECT...<--specifies initial set

UNION ALL

...

SELECT...<--specifies how toderive newrows

UNION ALL

SELECT...<--specifies how toderive newrows

...

)

[,any number of other CTE definitions]

Thus:

you can define the initial set as a union of several SELECTs

you can define the process to produce new rows as several SELECTs whose result is
union-ed at every iteration.

this CTE definition can be followed by another CTE definition, which may use the first CTE.

in a single WITH clause you can mix non-recursive CTEs and recursive CTEs (just note that as long as your clause has at least one recursive CTE it must start with the words WITH RECURSIVE).

the CTE’s column names, instead of being specified with AS aliases in the first SELECT, can be put right after the CTE’s name: cte_name(n).

Let’s take our 1-to-10 example, reduce it to 1-to-6 to save screen space, name the column using the my_cte(n) syntax, and use the result of my_cte to create a table:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

USEtest;

CREATE TABLE numbers

WITH RECURSIVE my_cte(n)AS

(

SELECT1

UNION ALL

SELECT1+nFROM my_cte WHEREn<6

)

SELECT *FROM my_cte;

Query OK,6rows affected(0,40sec)

SELECT *FROM numbers;

+------+

|n|

+------+

|1|

|2|

|3|

|4|

|5|

|6|

+------+

6rows inset(0,00sec

The CTE can also be used in INSERT (and REPLACE):

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

INSERT INTO numbers

WITH RECURSIVE my_cte(n)AS

(

SELECT1

UNION ALL

SELECT1+nFROM my_cte WHEREn<6

)

SELECT *FROM my_cte;

Query OK,6rows affected(0,12sec)

SELECT *FROM numbers;

+------+

|n|

+------+

|1|

|2|

|3|

|4|

|5|

|6|

|1|

|2|

|3|

|4|

|5|

|6|

+------+

12rows inset(0,00sec)

In UPDATE (single-table and multi-table):

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

WITH RECURSIVE my_cte(n)AS

(

SELECT1

UNION ALL

SELECT1+nFROM my_cte WHEREn<6

)

UPDATE numbers,my_cte

# Change to 0...

SET numbers.n=0

# ... the numbers which are squares, i.e. 1 and 4

WHERE numbers.n=my_cte.n*my_cte.n;

Query OK,4rows affected(0,01sec)

SELECT *FROM numbers;

+------+

|n|

+------+

|0|

|2|

|3|

|0|

|5|

|6|

|0|

|2|

|3|

|0|

|5|

|6|

+------+

12rows inset(0,00sec)

And in DELETE (single-table and multi-table); notice how the CTE is referenced from a subquery:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

WITH RECURSIVE my_cte(n)AS

(

SELECT1

UNION ALL

SELECT1+nFROM my_cte WHEREn<6

)

DELETE FROM numbers

# delete the numbers greater than the average of 1,...,6 (=3.5)

WHERE numbers.n>(SELECT AVG(n)FROM my_cte);

Query OK,4rows affected(0,01sec)

SELECT *FROM numbers;

+------+

|n|

+------+

|0|

|2|

|3|

|0|

|0|

|2|

|3|

|0|

+------+

8rows inset(0,00sec)

And in a similar DELETE, where the CTE is defined in the subquery itself, not anymore in front of DELETE:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

DELETE FROM numbers

WHERE numbers.n>

(

WITH RECURSIVE my_cte(n)AS

(

SELECT1

UNION ALL

SELECT1+nFROM my_cte WHEREn<6

)

# Half the average is 3.5/2=1.75

SELECT AVG(n)/2FROM my_cte

);

Query OK,4rows affected(0,07sec)

SELECT *FROM numbers;

+------+

|n|

+------+

|0|

|0|

|0|

|0|

+------+

4rows inset(0,00sec)

Pretty flexible, isn’t it?

A word of caution: what would happen if I forgot to type the WHERE n<6 clause? It looks like the query would run forever, as it would always produce new rows (no reason to stop at 6, anymore). But, starting from MySQL 8.0.3, there is an upper bound on the number of allowed iterations (1000), so the query would stop with an error when it has done 1000 iterations; and that bound can be increased or decreased by setting variable @@cte_max_recursion_depth. In older versions, I would have to rely on other means: if using the mysql command-line client, I could stop that by typing Control-C; with another client, I would open another session and use KILL ; or I could put this at the start of my session:

1

SET max_execution_time=10000;

so that the runaway query aborts automatically after 10 seconds, if the WHERE clause wasn’t correct.

That’s all for today. In next posts, I will continue on this topic, but there is already enough here to allow you to experiment with the feature. Before you do, please take note of two things which can save your time:

If you use a recursive CTE to create longer and longer strings (for example by using CONCAT() in the recursive SELECT), remember that the CTE’s column type is determined based on the non-recursive SELECTs only: so, in the non-recursive SELECTs, make the column wide enough with CAST (… AS CHAR(<length>)), or your long strings may not fit.

Inside the recursive CTE definition (the part in AS (…)), some syntax constraints must be respected (the reason for which will become clear in next posts):

a recursive SELECT must reference the CTE only once and only in its
FROM clause, not in any subquery. Of course, it can additionally reference other tables than the CTE and join them with the CTE, which can be very useful to build hierarchies (for example, if we have a table of bosses and employees and want to answer the question “who reports directly or indirectly to Mrs. X?”). If used in a JOIN like this, the CTE must not be on the right side of a LEFT JOIN.

Hi, Guilhem!
Thank you so much for the good article and and the long-awaited feature.
I want to translate your article on Russian. I hope you have no objection.

You write:
“We have a CTE cte4, which is built from other CTEs cte3 and cte2 which itself is built from cte1. Yes, CTEs can be chained like this! Which derived tables cannot do!”

But we can rewrite the example using derived tables:
SELECT MAX(txt), MIN(txt) FROM (SELECT concat(cte2.txt, cte3.txt) as txt FROM (SELECT CONCAT(cte1.txt,”is a “) as txt FROM (SELECT “This ” as txt) as cte1) as cte2, (SELECT “nice query” as txt UNION SELECT “query that rock
s” UNION SELECT “query”) as cte3) as cte4;
+—————————-+———————-+
| MAX(txt) | MIN(txt) |
+—————————-+———————-+
| This is a query that rocks | This is a nice query |
+—————————-+———————-+
1 row in set (0.16 sec)

Hello Vasiliy! Yes, you are welcome to translate it, thanks a lot!
Regarding your remark about the “cte4” query, you’re right; my example is too simple. So I am editing it right now: keeping the same query, and comparing it to the derived table query (which you showed), to show the readability difference. I’ll come back to CTE chaining in a more exact way in a next post dedicated to non-recursive CTEs.