Virtual Auxiliary Table of Numbers

An auxiliary table of numbers is a helper table that contains a sequence of integers from 1 and on. It is a very handy helper table that I use to solve many different types of problems. In case you cannot, or do not want to generate a permanent table, you can produce a virtual one on the fly very efficiently. The trick is to use cross joins. You start with a virtual table with two rows:

WITH

L0AS(SELECT 1 AS c UNION ALL SELECT 1)

Then, you perform a cross join between two instances of this table:

WITH

L0AS(SELECT 1 AS c UNION ALL SELECT 1),

L1AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B)

Then, you perform a cross join between two instances of the last table:

WITH

L0AS(SELECT 1 AS c UNION ALL SELECT 1),

L1AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),

L2AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B)

And after applying such cross joins five time, you have a table with 2^2^5 (4,294,967,296) rows:

WITH

L0AS(SELECT 1 AS c UNION ALL SELECT 1),

L1AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),

L2AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),

L3AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),

L4AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),

L5AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B)

To produce the actual sequence of numbers, you can use the ROW_NUMBER function with ORDER BY (SELECT NULL), letting the optimizer know that it doesn’t really need to sort the data.

You can create a table function based on this code, and request a certain number of numbers by passing an input parameter. The tricky part is to come up with a solution that stops processing the Cartesian products as soon as the requested number of numbers was produced, and not always try to generate all four billion of those. Until recently, I used the following solution:

This code finishes fast and does stop processing after 10 rows are produced. If you look at the execution plan produced for this query, you will notice a Top operator that is in charge of stopping the processing at the right point. However, I noticed that in certain cases the optimizer removes the Top operator, and actually tries to generate all four billion rows, before the filtering part. Here’s one such example:

WITH Primes(p) AS

(

SELECT 2

UNION ALL SELECT 3

UNION ALL SELECT 5

UNION ALL SELECT 7

)

SELECT*

FROM Primes

CROSS APPLY dbo.GetNums(Primes.p) AS Nums;

Recently I revised the solution to use the TOP option instead of the WHERE filter like so:

Discuss this Article 6

I've tried the recursive approach you suggest, oddsock and as per article #94376, having the OPTION within the function doesn't work (syntax error) - it'd be a huge pain to have to remember to specify the OPTION on all queries that use GetNums.

Hi BG,
Could u give us a example which type of problem need to use Virtual Auxiliary Table of Numbers ?
looking forward your Article
[Reply:
There are many: splitting separated lists of values, generating a sequence of date and time values for a time dimension in a data warehouse, unpivoting data, producing copies of rows, generating histograms, and others.
Check out InstantDoc ID #100657 to see the splitting separated lists of values example (http://www.sqlmag.com/Articles/ArticleID/100657/100657.html?Ad=1)
Cheers,
Itzik]

WITH
L0 AS(SELECT 1 AS c UNION ALL SELECT 1 /*repeat x16 or even more, use row constructor in 2008!*/)
L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L3)
SELECT TOP (@n) n FROM Nums ORDER BY n;
is a lot faster. I'm not sure where the tradoff between constructor and cross join lies, but it is certainly not at 2 records!

Nice idea, but it does seem overly complex. This will do the job just as well:
IF OBJECT_ID('dbo.GetNums') IS NOT NULL DROP FUNCTION dbo.GetNums;
GO
CREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLE
AS
RETURN
WITH CTE
AS
(SELECT 1 AS n
UNION ALL
SELECT n + 1
FROM CTE
WHERE n < @n)
SELECT n
FROM CTE
OPTION(MAXRECURSION 0)
G0
[Reply: Hi oddsock,
Besides the fact that you cannot specify the MAXRECURSION hint as part of the inner query (as William mentioned), the recursive solution is very slow compared to the nonrecursive one I showed. It took my solution 3 seconds to generate 10,000,000 numbers (with Discard results after execution turned on), while it took your recursive query 172 seconds to do the same.
As for the complexity of the solution, it’s hidden from the user by encapsulating it in a function. As long as you trust that it works correctly, the use is simple.
Cheers,
Itzik]

does not work in an inline function (though no documentation from MS). It took me 3:48 secs to generate the 10,000,000 numbers using your above code, where 1:05 secs using the revised code (using TOP) on the page by Itzik.

From the Blogs

The quest for the Golden Record to achieve a single, accurate and complete version of a customer record is worth the pursuit to attain survivorship. Record matching and consolidation are only the beginning. Melissa Data takes a new approach. Learn how to apply intelligent rules based on reference data to make smarter and better decisions for data cleansing....More

On SQL Servers where Availability Groups (or Mirroring) isn’t in play, I typically recommend keeping a combination of on-box backups along with copying said backups off-box as well. Obviously, keeping databases AND backups on the SAME server is the metaphorical equivalent of putting all of your eggs in one basket – and therefore something you should avoid like the plague....More

One of the biggest strengths of AlwaysOn Availability Groups is that they allow DBAs to address both high availability and disaster recovery concerns from a single set of tooling or interfaces. But, this doesn’t mean that you won’t still need backups....More