Lexical Exercise

Exploding Rows

One of the many uses of tally tables is to blow up some rows. Normally, you make your database reduce rows by joining tables to others and applying criteria. But sometimes you want it to make more rows.

Phone Home

Imagine a simple table containing contact information. It has two places for phone numbers (Phone1 and Phone2). Let’s say you wanted to split those off into a related phone numbers table so you could support many more numbers for a single contact. As part of that process, you’d need to create two rows for each existing contact– one for each phone number.

One way to get two phone numbers is to handle each phone number as a separate process, combining a list of all contacts and their first number with those same contacts and their second number. So you’d SELECT everything from the table once for the first phone number and then UNION it with another SELECT. This works, but if you have a large list of columns or a complex set of JOINs, you end up repeating a lot of code.

Instead, what if you simply, magically double the rows and then pick and choose the columns you wanted from each of the rows? You’d pick the first phone number for the first copy and the second phone number for the second copy. So what’s the magic? The Cartesian product.

Criss-Cross Applesauce

Normally, when you JOIN two tables, you want to correlate them somehow. If you had contacts in one table and phones in another, you’d want to make sure the phone numbers in your final query correlated to the relevant contact. The contact name might be repeated for each phone number, but it’s not just willy-nilly. A Certesian product is (seemingly) willy-nilly.

If you took a table with “A” and “B” and combined it with another table of “A” and “B”, you’d often want to match those values up. You’d take a query like this:

That’s because you’re correlating the values in one table with the other. What if you removed that correlation? Well, you’d get an error if you just removed the JOIN condition altogether, so let’s just fake its removal by making it a tautology like 1 = 1:

Kaboom!

So you can use a CROSS JOIN to get a list of everything in one table combined with everything in another table, all willy-nilly. If that “other table” is just a derived table of numbers (or a materialized tally table), you get as many copies of you original table as you have rows in your table of numbers. So if we want 2 copies of our table, we can just cross it with a tally table with two rows:

Just in CASE

For our phone number splitting, we don’t just want a copy, of course. We’re copying with a purpose. We want to fiddle with each copy, grabbing a different phone number each time. When we’re looking at the first copy, we want the first phone number (Phone1); or else we can get the second (Phone2).

Let’s demonstrate by populating our simple contact table and then use two different methods to gather up a list of contacts with each phone number in a separate row.

Eat that cake!

In this short example, the code explosion method looks about the same size as the UNION method, but it scales linearly with additional columns to split, whereas the UNION method would scale geometrically, duplicating the full list of columns– and the full FROM clause– for each iteration.

But wait, there’s more. In addition to being more concise code, with less error-prone duplication, it also probably performs better. In this example, if you use SET STATISTICS IO ON, you can see that it uses half as much IO: