Tables on the Fly

You might have heard of Common Table Expressions (CTEs), T-SQL’s tool for building tables on the fly in the single statement. They’re a lot like Derived Tables, in that they’re SELECT statements which can be used by other portions of an encapsulating SELECT statement. But unlike Derived Tables, they can be used multiple times (in the same statement). It’s not accurate to think of them as statement-level temporary tables, because they aren’t filled up with data as a discrete step. Except for their recursive call feature, they are tantamount to repeating the contained SELECT statement multiple times in the main SELECT statement.

For example, this SELECT statement repeats the same Derived Table multiple times.

They have the same query plan. And there is no separate step in the query plan for the “population” of the supposed temporary table.

So if you’re going to repeatedly use the same *data* multiple times in a query, the Common Table Expression is your tool of choice. But what if you want to use the same logic?

Pump it up.

Unfortunately, there’s no similar syntactical sugar to create a statement-level user-defined function. There’s no direct way to have the same calculation applied to different columns of data without creating an actual user-defined function or simply repeating the logic. But there is a neat trick which in the right circumstances can accomplish the same goal without costing you any performance.

The goal, as with any function, is not repeating yourself. It’s not just a bunch of extra characters in your code. It’s an accident waiting to happen– two representations of the same logic which are just bound to diverge due to our cursed human imperfections.

But how can you run different columns of data through the same calculation in a query? Turn them into rows!

Calculations are already performed repeatedly for every row in a query. So all you have to do is take your columns and explode them up into multiple rows. In order to apply a calculation to Column1 and Column2, first change each row of the data into two more rows with a new, computed Column3 which has the original row’s data for Column1 first and the data for Column2 second.

In other words, you need to convert this:

Column1

Column2

ABC

DEF

GHI

JKL

Into this:

Column1

Column2

Counter

Column3

ABC

DEF

1

ABC

ABC

DEF

2

DEF

GHI

JKL

1

GHI

GHI

JKL

2

JKL

Once you have that vertical list of the columns’ values, you can APPLY your calculation to it, once, and it will operate on all of the values.

Column1

Column2

Counter

Column3 – Altered

ABC

DEF

1

CBA

ABC

DEF

2

FED

GHI

JKL

1

IHG

GHI

JKL

2

LKJ

All that’s left is to smoosh those vertically-represented values back into a single horizontal row. You could use PIVOT, but that syntax is a little tricky. Instead, first duplicate the calculated column and then selectively suppress the values you don’t want (by using a CASE expession). You’ll end up with this:

Column1

Column2

Counter

Column1 – Altered

Column 2 – Altered

ABC

DEF

1

CBA

ABC

DEF

2

FED

GHI

JKL

1

IHG

GHI

JKL

2

LKJ

If you take the MAX of each of those calculated columns, you end up with a single row for each one, with your final results:

Admittedly, there is more overall code in this simplified example, but imagine that the logic was more complex. And also, remember that it’s not about character count, but rather removing duplication because of the potential for different behaviors if the copied code isn’t perfectly maintained.

Unfortunately, unlike with Common Table Expressions, this is not mere syntactic sugar. This is semantically different T-SQL which makes the engine do more work– first, spreading apart the columns into rows before shoving them through your calculation and then squishing them back down again. But in the right circumstances, it might add only trivial overhead, as in this example.

Still, use this technique with care. It’s almost always better to just make actual inline user-defined functions, which often have identical performance with the benefit of code encapsulation, as they do in this example.

Ceci est une pipe

This isn’t the only way to reduce code duplication in your complex T-SQL statements. If you have two calculations which have similar beginnings but differ in the final steps, break them apart into their separate steps so that you can re-use the first steps of the calculation.

It’s helpful to think of each APPLY as a pipe operation, taking the values from the previous derived table and passing them into the next to be manipulated. Programming T-SQL in this manner (loosely) approximates modern functional programming techniques.

It keeps each step of the logic smaller, so that it’s easier to understand. And you can expose the intermediary columns to help with debugging.

Department of Redundancy Department

You are going to have business logic in your database, whether you like it or not. Use these and other techniques to make sure it’s only in there once.

Committed to Diversity

That Conference is also committed to diversity. First and foremost, it’s a family conference. Hundreds of “geeklings” roam the grounds, fueling the imagination of a new generation, filling the halls with energy, and reinforcing the importance of a healthy balance between work and family. Clark routinely encourages young attendees to get involved. Indeed, Katelyn Drew gave her first session as an 11 year old in 2014, having led an Open Space the year before. The conference dedicates roughly a fourth of overall session floor space to these “Open Spaces”, where anyone can talk about any subject, regardless of their expertise or following.

They also have a code of conduct explicitly calling out harassment based on “gender, sexual orientation, disability, physical appearance, body size, race, or religion”. And they have a separate buffet line at every meal for special dietary needs. They’ve removed some ableist language, such as changing their “law of two feet” (encouraging conference participants to leave an environment if they aren’t gaining or providing value) to a “law of mobility”. This year, they added a Women in Technology panel session which wasn’t just a bunch of dudes. And they make special effort to ensure a balance of women speakers during the selection process, first by removing biographic information from abstracts for the community voting process and then by applying a balancing algorithm to the final selection.

We know why this is important. Seeing yourself in those whose success you admire helps you imagine being in their place. It’s critical motivation.

Falling Short

There’s of course more they could do. For example, they could provide ASL interpreters and/or live captioning (especially for the keynote track) to be more welcoming to those with hearing impairment. Also, providing free alcohol excludes some folks and the money could be better spent. But by and large, they really seem to be trying to create a “collaborative, collegial, diverse, and respectful atmosphere”. (That Conference 2018 guidebook, page 4.) From my perspective, they’ve succeeded. I feel completely welcome and at home. I look around and see people just like me. But my perspective is meaningless; I’m a white dude.

Still, it’s not nearly as bad as some conferences, and one woman speaker spoke favorably of the welcoming environment as compared to another, similar conference. And there’s a bigger problem.

“Shockingly White”

During a feedback session near the end of the conference, a black attendee said that the conference was “shockingly white”. I had given it a passing thought earlier, but I dismissed it as typical of tech conferences. Then again, I’m a white dude.

Based on my quick, visual review of the 164 speakers in 2018, there were 10 who appeared as if they’d identify as a person of color, or about 6%. It would be impossible to understand the composition of the attendees without some sort of survey data, but when I reviewed the admittedly few pictures I’d taken, everybody I saw was white.

Clark admitted it was a problem, but he made it clear they hadn’t taken the concrete, purposeful steps to address it as they’d done to increase the participation of women. And then he made what I have to characterize as a mistake, though it might seem innocent at first. He asked for help– specifically from the individual who’d given him the gift of this observation. Even after they demurred due to their obligations working on this issue with other conferences, he asked again, with a twinge of desperation in his plea.

Diversity is a bit of a chicken and egg problem. The lack of it often makes for a less inclusive environment, which itself harms diversity efforts. But it’s not insurmountable and it’s on the privileged to fix. #MinneDiversityhttps://t.co/5xj8blVqDI

I understand that the conference is run mostly by volunteers, and it might be natural to expect others to give of their time to improve our overall tech community. And I get that if you offer a criticism, you should be ready to help fix it– “put up or shut up” and all that. But underrepresented and marginalized groups already have to do more work for less recognition. If we are asking for their help, we need to be prepared to pay. This effort deserves focus, attention, and resources.

A Plan

So what should That Conference do? First, officially recognize and announce this as an initiative. Second, invite underrepresented and marginalized group members into the organizing team, and offer to compensate them. Finally, contract with a professional diversity and inclusion expert such as Ashe Dryden who ran the successful AlterConf events. (Or at least take advantage of her list of tips.)

Of course all of this is going to stretch your already limited resources– time and money. With respect to time, you implemented several new ideas this year– such as TVs for collaboration and a maker space in the Open Spaces area. This diversity initiative might have to be the entirety of your “new ideas” for next year (especially with the changes in store due to the convention center construction completion).

Funding might be more challenging, as you’ve described each year as a struggle to make ends meet. You can start by eliminating early bird discounts and cutting the drink tickets for attendees. You can scale back or eliminate speaker gifts (you had an almost 6:1 ratio of talk submissions to slots). You can collect revenue directly by having diversity scholarship opportunities for sponsors. You can explore sliding scale ticket pricing; corporations can and will pay more for tickets than independents and students. And finally, let attendees simply donate toward this effort.

It’s worth it.

On the last morning of the conference I cut myself shaving, and it was a bleeder. I just couldn’t get it to clot. So I had to use a bandage, and the front desk only had a big one. Luckily, it was “flesh”-colored. For white skin anyway. Even then, all day long I was conscious of that thing and I was certain I looked ridiculous. It had a real effect, and I knew I could eventually take it off. It wasn’t my very skin which made me feel different.

When I look around That Conference, I feel like “these are my people”. By that I mean fellow techies– at least at a conscious level. But how much of that is the comfort of seeing people who look just like me?

Unfortunately, as a white male I have plenty of places in the world where I feel welcome. We need to change things up so others feel just as welcome as I do.

Success is a funny thing. Everybody has their own idea of what it means, and often they’re wrong.

Sometimes they think something will make them happy when it won’t; other times they underestimate the costs. And regardless, once you achieve your goals, you just end up making new ones. You’re never done; you’re always pining.

Humans are good at adapting. Eat enough salty food and it doesn’t taste salty anymore. Spend some time around something stinky and eventually your nose stops wrinkling. Get a new phone and marvel at its speed and features, only to take them for granted and complain about its faults in a few weeks.

It’s human nature. The last bite of a candy bar is never as satisfying as the first. Your first few jumps on a trampoline lift your spirits, but eventually you’re exhausted and your head starts to hurt. You rejoice in seeing your kids after being apart but soon you’re wishing they would just go play by themselves for a minute.

Unfortunately, this happens with money, too. A dollar isn’t worth the same to a prince as it is to a pauper, even though it has the same buying power. While this might make wealth transfer an effective tool for increasing societal happiness, it also means that once you start making the money you thought you wanted, you end up wanting more.

That’s why your vision of success is a moving target. As you succeed you keep moving the goal posts because what you have is the new normal.

As you learn, you automatically adapt to your new level. It never feels like an accomplishment because you continually take for granted what you already know. And the more you know, the more you know you don’t know. So you continue to face the same daunting challenge of the unknown, and it feels just like it did at the beginning. It’s demoralizing, and an important part of teaching is motivating students by forcing them to take stock of how far they’ve come.

Yet we seldom apply that philosophy to “success” in general. We don’t look to our mentors to tell us how great we are, but rather to help us be greater. And unfortunately, adulation from others has the same diminishing returns as everything else. Your first Like on a post is a thrill; after a few hundred you’re looking for how to turn those notifications off. A growing following doesn’t provide the validation you seek. There’s always someone with a bigger audience to compare yourself to.

So chasing success is in some ways a losing battle. You’ll never consider yourself to be successful. But it does provide some motivation; it does provide a reason for being, if you don’t have anything more fundamental to drive you. But don’t forget the cost.

Striving for success in any traditional sense— more money, more power, more prestige, more impact— requires work. You have to differentiate yourself. You have to do more. Sure, there’s an element of working smarter rather than harder, but generally hard work is a necessary (*but not sufficient*) element of “success”.

Unfortunately for our cherished idea of a meritocracy, hard work is only a part of the equation. The other part is luck. Luck takes a lot of forms— like the right idea at the right time or a serendipitous confluence of people and opportunities. But sometimes “luck” is about where you happened to be born, what you happen to look like, how healthy you happen to be, what abilities you happen to have, how much your parents happen to help you, or who you happen to know.

And even if you win the humanity jackpot— you’re in a place with stable resources, relative safety, and access to technology, giving you a head start over billions— you still have to work to distinguish yourself. You have to trade your most precious, finite resource— time— for the *chance* to advance toward your goals. And here again, your very nature conspires against you.

As a kid, you have oceans of time. Remember summers? They were endless. Every year was a huge portion of your life, yet still only one of many dozens to come. You could afford to waste time, then. And waste time we did! Remember boredom? You had so much time you didn’t even want it all. You wanted to skip to the good stuff.

And here you are. Is it the good stuff yet? It doesn’t matter, because you’re too busy. No matter what, you want more time. As how much you have left diminishes, each bit matters more. So the irony is that the more time you’ve taken to acquire your skills— the more valuable you’ve made the gift of your time— the more valuable that time is to you.

Are you sure you want to spend it on something as elusive as “success”? And then what? Success has maintenance costs; you have to keep paying time to stay on the top of your game, until your eye catches on the next big thing and you resume your chase. Because once you’re here, you’ll want to go there.

So what’s the moral of all this? Is it all doom and gloom? No. It’s simple, really. Appreciate what you have. But try not to do it by comparing to others who might not have it. Instead, think of what you can share— even your own time— because someone else’s appreciation of it will rekindle its value to you.

If a pile of matchboxes can learn to play Tic-Tac-Toe, then certainly I should be able to code a solution. With my predilection for databases, T-SQL was my tool of choice. But first, I needed to model the game with some sort of data structure. Unfortunately, as I learned, it’s not as simple as mapping some X and O values to a grid.

Data Model

1

2

3

4

5

6

7

8

9

Before we can analyze a game, we need to model it. We can visualize a grid of numbers labeling each square in a Tic-Tac-Toe game and we can imagine storing the character displayed (X or O) in each of those blocks, but that misses a critical aspect of the game.

Turn, turn, turn…

You can’t just look at a grid of X and O values and tell who won. Or rather, you can, if the game stops when someone wins. (Otherwise there might be more than one player with three in a row.) But that very concept– a stopping point– requires an understanding of turns.

In addition to the current game state– the values in each cell of our grid– we need to know how we got there.

General Purpose

If I were building a general purpose gaming engine, I’d want a flexible turn management system. I’d want to allow for any number of players making any number of plays. I’d want a big vertical list of every turn made in every game. It would look something like this:

GameID

Turn

Player

Position

Play

1

1

1

9

X

1

2

2

5

O

1

3

1

6

X

1

4

2

3

O

1

5

1

7

X

1

6

2

8

O

1

7

1

2

X

1

8

2

1

O

1

9

1

4

X

This can handle a simple game like Tic-Tac-Toe, of course, but it can expand to any other type of game which has a concept of positions and sequential actions. It even allows the same player to play multiple times in a row, players to put a value in the same space more than once, and play values other than X and O. And therein lies one major weakness. It’s just a ledger; it has no inherent intelligence.

New rule!

We could give it some smarts. For example, we know that in our game, we can only choose X or O, so we could put a data constraint on the Play column. And we know that in our game, you can’t play on the same space more than once, so we could put a unique constraint on the combination of GameID and Position. You could even be really creative and enforce our game’s alternating player moves by putting a data constraint on the Player column such that it equaled 1 when (Turn Modulo 2) equaled 1 and 2 otherwise. (Really it wouldn’t need to be a data column at that point, just a calculated column.)

But imposing those restrictions robs our data structure from its raison d’être. It’s no longer a general purpose game play storage system; it only works for our game.

With modern SQL Server, you could have this cake and eat some of it too with filtered indexes and more complex data constraints. If you denormalized a bit and copied your GameType column from your Games table, you could enforce your single-use positions by creating a unique index on GameID and Position, but when filtered by a particular GameType (type = Tic-Tac-Toe). Similarly, your data constraints on Play and Player could be tailored to each GameType. This will make your table definition complex.

Instead of saddling the data storage itself with all of those rules, we could enforce all of the game mechanics through our data interpretation and manipulation logic. When we saved a game move, we could make sure that an X or O was played and it could check to see whether the specified square was already used. When we analyzed a game to determine a win, all of the criteria could be housed in that consuming query. But this flexible design isn’t done inflicting its complexity on us.

Squishy, squishy.

When your game data is spread across multiple rows, you have to gather all of those rows and intelligently interpret them as a group. This often involves “flattening” them into a single set of data.

For example, X wins if the value X is in position 1, 2, and 3 *before* O wins. Even just testing if X is in position 1, 2, and 3 involves checking for the existence of 3 different values in three different rows. You’d need to hit the table 3 times (one for each check), or use clever flattening logic to hit the table once, such as:

And that’s just for 1 of the several ways X can win, and it doesn’t even account for the order of play. That’s Complex (with a capital C). So rather than repeatedly query the set of rows like this, we would probably gather all of the values up into a single “row”, exposing each move as a column. Checking for this same win would look something like this:

The columns in the derived table represent each successive move. First, player 1 (“a”) makes their first move (“a1”). Then, player 2 (“b”) makes their first move (“b1”). Then player 1 makes their second more (“a2”), etc. The values in the columns are what position was played. Our above game would result in these values when flattened to that single row:

a1

b1

a2

b2

a3

b3

a4

b4

a5

9

5

6

3

7

8

2

1

4

Our “win” calculation involves checking whether all three squares– 1, 2, and 3– are present in any of X’s moves.

The code looks more complex at first blush, but importantly it allows us to check multiple conditions in our CASE statement. We can check for an X win with positions 1, 2, and 3, and then check for a win with positions 3, 4, and 5, etc., all without additional hits to the underlying table.

What’s more, we now have play order information embedded in the data as well; we can check if X’s win came before an O win, because we know when X and O made their respective plays.

Private Pragmatic

Now, I’m not building a general purpose game management system. I’m playing a game about playing a game because it seemed interesting. So let’s skip the elegant, extensible, vertical play table and go right to the single-row game representation. We’ll just store our games with each move represented with a column: a1, b1, a2, b2, etc.

Each move can be one of 9 different squares, and the smallest way to represent 1-9 is a tinyint. We can also add a column to record who won the game. Here’s our table:

At first blush, we might think we could store who wins a game as a bit, since either the first or second player will win. Except when they don’t. After all, Tic-Tac-Toe is designed such that perfect play will always result in a draw. So we must be able to represent that third possibility. Luckily, SQL Server allows NULLs, so our single bit column can play triple-duty.

Ever wonder how SQL Server can shove more than a bit-worth of information into a bit-worth of space? They haven’t defied the laws of physics. Rather, the NULL status of a column is stored outside of the column itself, in the NULL bitmap at the row level.

Winning

In order to figure out who won a game, we at least have to be able to recognize a win– three of the same symbols in a row, horizontally, vertically, or diagonally. We saw (part of) one possible calculation above, where we check if one player has played all of the positions across the top row. But there are 8 ways to win, so our CASE statement would be at least 8 times as big. If only there were a way to compare a winning scenario with our actual game to see if there’s a match without several dozen ANDs, ORs, and repeated column lists.

I’m the map!

Fundamentally, a “bitmap” is a collection of bits. These days we think of bitmaps as images but that’s just a special case of the general idea. We can envision the X plays on a Tic-Tac-Toe board as a collection of bits– either an X was played or it wasn’t– in a numbered set of boxes. If we then imagine the boxes as powers of 2, we can see how any set of X plays can be represented as an integer.

If any collection of X plays can be represented as an integer, then it follows that a winning collection of X plays can be represented as an integer. And it just so happens that computers can easily check if all of the bits in one integer appear in another integer, using a concept called bitmasking.

You’re holding it wrong.

Remember that the way we typically represent numbers is “big-endian“, where the more significant digits are to the left. 100 is bigger than 001.

In decimal, the right-most digit is 10 to the 0th power (the ones); the next digit to the left is 10 to the 1st power (the tens); the next digit to the left is 10 to the 2nd power (the hundreds); and so on.

Similarly, we represent binary numbers with the more significant digits to the left. The right-most digit is 2 to the 0th power (the ones); the next to the left is 2 to the 1st power (the twos); the next digit to the left is 2 to the 2nd power (the fours); and so on.

Digit:

1

2

3

4

Power:

3

2

1

0

(Interestingly, we can’t know the impact of the first digit in our number without knowing how many total digits there are going to be. And the way we represent numbers, right-to-left, flies in the face of our otherwise left-to-right language.)

One way to map our numbered squares to bits is to make the square number equal to the power of 2. So our winning state of the top row, consisting of squares 1, 2, and 3, would equal 21 + 22 + 23. We’d represent that as “1110”. Because we started counting from 1, we never use that 0th power, the right-most digit; it’s always 0. And because we know we’re going to have 9 total squares (10 total digits with our ignored 0th power), we’d probably write this number as “0000001110”.

Collection2

So we can represent a winning collection of bits with a single integer, and we can check our game against that easily, but there are multiple ways to win, so we need a collection of those collections.

Winning Game

Square Number List

Bitmap (Bigendian)

Decimal Value

Top Row

1, 2, 3

0000001110

14

Middle Row

4, 5, 6

0001110000

112

Bottom Row

7, 8, 9

1110000000

896

Left Column

1, 4, 7

0010010010

146

Center Column

2, 5, 8

0100100100

292

Right Column

3, 6, 9

1001001000

584

Backslash

1, 5, 9

1000100010

546

Slash

3, 5, 7

0010101000

168

Now a collection of integers sounds like a table to me. But we’re never going to change this data. It’s fundamental to the game; it’s not user editable. So we might as well “hard-code” the values. So let’s make an inline user-defined function which returns a list of the all of the winning conditions.

Playing

So we we have a list of the bitmaps of all of the winning states. In order to check a player’s moves to see if they’ve won, we need to map their play bits as well.

Power to the Player

We’re modeling our game by representing each square as a power of 2, and we construct bitmaps by adding together all the powers represented by the played squares. Our data model represents a game as a row with a column for each play, but each player has their own bitmap. And that player’s bitmap evolves through the course of the game. So let’s build a function which computes a bitmap based on the player and the turn.

Importantly, the function only adds up those plays which belong to the specified player– hence the @player check on each play. Also, because we want a sense of time– we want to understand what the game state is as of a particular turn– we only count those plays which were made up to the requested turn.

Check 1 2…

So now we have a tool to understand the bitmap of any player at any turn (CalcTTTPlayerBitmap), and we have a list of all winning bitmaps (CalcWinList). We need to put those together to understand who won a particular game.

We understand that bitwise math allows us to quickly check if one bitmap is completely contained within another bitmap. If a winning bitmap is completely contained in the player’s bitmap at a particular point, they’ve won (provided they’re the first to achieve that).

So order matters here. We need to see who won first, so we need to check each player at each turn. We could do this with a very long, specially constructed CASE statement, repeatedly calculating the player’s bitmap at a particular turn with numerous calls to CalcTTTPlayerBitmap. But any time you find yourself calling the same function multiple times in a T-SQL statement, you should consider exploding some rows.

Naughts and CROSS JOINs

In order to check each player’s bitmap against the winning bitmaps at each stage of the game, we need a list of every play of the game. There are only 9 squares, so there are no more than 9 plays. Since the players take alternating turns, no player can play more than 5 times.

We want a list of turns 1 through 5 for player 1 and player 0 (aka player “O”– the second player). That means we want a row of 1 and a row of 0 for each of the numbers 1 through 5.

Player

Turn

Name

1

1

a1

0

1

b1

1

2

a2

0

2

b2

1

3

a3

0

3

b3

1

4

a4

0

4

b4

1

5

a5

0

5

b5

(We can ignore the last row as it’s impossible for the second player to move a fifth time, as there are no squares left.)

Any time you need to get every possible value in one collection (player table) matched with every possible value in another collection (turn table), you are making a cartesian product. You accomplish this in SQL with a CROSS JOIN.

For each row in that full list, we need to compute whether the player’s bitmap– up to that point– results in a win. As soon as we find a win (order matters), we have a winner (the player who just played).

CREATEFUNCTION CalcTTTWin
(@a1tinyint,@b1tinyint,@a2tinyint,@b2tinyint,@a3tinyint,@b3tinyint,@a4tinyint,@b4tinyint,@a5tinyint)RETURNSTABLEASRETURN(SELECTTOP1
Winner =CONVERT(bit,Players.Player)FROM(SELECT1UNIONALLSELECT0)AS Players(Player)CROSSJOIN(/* We could omit 1 and 2 here, as we know no one can win before the third round. */SELECT1UNIONALLSELECT2UNIONALLSELECT3UNIONALLSELECT4UNIONALLSELECT5)AS Turns(Turn)CROSSJOIN CalcWinList()AS WinBitmaps
CROSSAPPLY CalcTTTPlayerBitmap
(
Players.Player,
Turns.Turn,@a1,@b1,@a2,@b2,@a3,@b3,@a4,@b4,@a5)AS PlayerBitMap
WHERE PlayerBitMap.PlayBitmap& WinBitmaps.WinningBitmap= WinBitmaps.WinningBitmapORDERBY Turns.TurnASC,/*
Player 1 is X; Player 2 (O) is represented as 0 here.
Player 1 goes first, which is why this is DESC, because 1 > 0.
*/
Players.PlayerDESC);

With our TOP 1, we will only get the first winning play. This simulates play stopping as soon as there is a winner; the additional plays are irrelevant, even if they are present. If there are no winning plays, nothing will be returned. That’s our draw.

So we now have a function which will compute a winner (or lack thereof) at any point of a game. Crucially, even if a game continues beyond the winning play, the correct winner will still be calculated.

Why is that crucial? Because we’re going to play a lot. Like, we’re going to play every possible game.

Brute Force

There are a lot of possible games of Tic-Tac-Toe. If we let gameplay continue until the board is full, regardless of whether a win occurs, there are 362,880. That’s because the first play can go in any of the 9 squares, the second play can go in any of the remaining 8 squares, the third play in any of the remaining 7, etc. The last play only has one option. So that’s 9 options times 8 options times 7 options, etc., or (9!).

We can make SQL Server do the work of creating all those possibilities. First, we make a “table on the fly” (otherwise known as a Common Table Expression or CTE) of our numbers 1 through 9. Then we can smash those tables together over and over– 9 times– using our CROSS JOIN trick. The problem is that this creates impossible games, such as when each player plays in square 1 on every play. We could weed those out at the end, or we could be more careful about how we create the big list in the first place.

Instead of creating all of the rows blindly and then removing the invalid rows with some criteria, we could combine values more selectively along the way. For example, instead of g CROSS JOIN g AS g2, we could have done the filtration during the JOIN process with an ON clause, limiting each successive combination so that none of its values occur in the previous values. For example, g JOIN g AS g2 ON g2.n <> g.n. That gets values 1 through 9 from the first “table” and for each of those values, it grabs 1 through 9 from the second “table”, so long as it doesn’t equal the first table’s value. Scale this up and we create T-SQL statement to create every row of our games table.

So we’re done now?

We’ve built a method of calculating the winner of a game, we’ve derived every possible game, and we’ve calculated the winner for each of those games. While that was a fun and educational exercise, we haven’t really taught the computer how to play Tic-Tac-Toe. In order for the computer to really be playing the game, the computer has to choose a play. We’ve done all of the choosing for it by “choosing” everything possible.

So I think we need to take this further. We need to write something which allows the system to choose what to do next. We need to imbue it with some artificial intelligence. And we’re going to do it with T-SQL.

Stay tuned.

]]>https://scribnasium.com/2018/08/tic-tac-two/feed/0T-SQL Inline User-Defined Functions are Sargablehttps://scribnasium.com/2018/07/inline-udfs-sargable/
https://scribnasium.com/2018/07/inline-udfs-sargable/#respondWed, 01 Aug 2018 04:59:32 +0000https://scribnasium.com/?p=976
You might have heard that SQL Server user-defined functions are the devil. That’s because someone got burned. They put a scalar function in a WHERE clause on a big table and fed it a column name. And bad things happened.

In this case, “bad” means slow. Really slow.

Sarga-What-Now

In order for queries to perform well, SQL Server needs to be able to find the data quickly, and that usually means using an index. An index is a list of key data from a table, organized to make finding things faster. Just like in a book, instead of scanning every page for a mention of a word, you can use the index, where the author– or her computer program– has done that work for you as a handy reference.

An index is fastest when you can use it directly. If you’re looking for pages which mention “SQL”, you find “SQL” in the index and it tells you the page numbers to check. But what if you’re looking for all words which end in “SQL”, like “T-SQL”? Now can you can’t go right to the part of the index you want. You have to scan through the whole index. You no longer have a single search argument (“SQL”); you have a search expression (“ends in SQL”). The industry term we use is that your search is no longer “sargable” (“Search ARGument ABLE”).

The moral of the story is that if you want an index used most effectively, make sure you’re searching on exactly what’s being indexed. If you’re searching by a date, and you want to find everything in a certain year, don’t use WHERE Year([DateColumn]) = 2018". Instead, use WHERE [DateColumn] >= '20180101' AND [DateColumn] < '20190101'". As a human, you can understand these as the same thing, but often the optimizer just sees "arbitrary function on indexed column" and figures it can't use the index the normal way but instead has to slog through the whole thing and run the function to see what it does.

And if the optimizer has this problem with the built-in function "Year", what do you think it's going to do with your scalar "Complex_2000_Line_Monstrosity"? It's going to treat it like the black box it is and run through every entry in the index to test it. And that assumes there's an index. A big user-defined function often needs several different columns' worth of data, some of which isn't in an index. So now you're scanning through the whole table. Grab yourself a cup of joe. You'll be there a while.

You can see right through them.

While scalar functions are opaque to the optimizer, inline functions are transparent. Instead of seeing an inline function as a black box, the query optimizer sees it rather like a parameterized view-- that is, a view which happens to take parameters. Views and inline functions are essentially copied and pasted into the query which calls them, as if they weren't a separate construct at all.

User-Defined Function Type

Results

Multi-Statement

Opaque

Scalar

Single Value

Yes

Yes

Multi-Statement Table-Valued

Table

Yes

Yes

Inline Table-Valued

Table

No

No

Now just because your function boundaries fade away in optimizer-land doesn't mean you will find your seek. You don't need a function to destroy saragability. Simply rearranging your criteria logic can handle that on its own. For example, you won't get the benefit of your index on Column1 if you use WHERE 1 = CASE WHEN Column1 = @value THEN 1 ELSE 0 END instead of WHERE Column1 = @value. Again, we humans can see that's the same thing, but SQL Server does not.

But if you use an inline function which directly acts on an indexed field, without an obfuscating calculation, the optimizer will still use an indexed seek. Just how to make it happen feels a little more like art than science.

Pudding

Consider an Orders table with an OrderID column, an OrderStatusID column, and an OrderDate column. The OrderID and OrderDate are irrelevant for our examples except to provide some additional order-related data. The key column here is OrderStatusID. Let's say we have many statuses an order could be in (New, Processing, Shipping, Paid, etc.).

Now let's say we have a business rule which applies in only some statuses. Perhaps an order is handled by a certain department only if it's New or Processing but not when in any other statuses. If you're putting this business logic in the database-- as you should-- your database code will need to understand which statuses are special.

The most performance-efficient way to query the Orders table for the special status orders is to directly list them in each relevant query:

SELECT o.*FROM #Orders o
WHERE o.OrderStatusIDIN(3,8);

With an index on the OrderStatusID field, the optimizer would likely use an Index Seek action and find your rows lickety-split. However, you'll probably need the database to understand that concept in many different places. Rather than repeat this list of "special statuses", you decide you want a function to calculate whether a status is special. That way your business logic is centralized:

Unfortunately, you run into the same index scan performance issue. You can't really blame the inline function, though. After all, you get the same poor performance using the function's CASE/WHEN logic directly:

So you have to find a way to use an inline function where the optimizer sees a more direct usage of the indexed field. So you create an inline function which doesn't use any calculation to produce its bit.

And voila! You have your logic encapsulation cake and you get to eat your index seek performance, too.

But that's not the only way to make this work. And this is where it starts to feel a little more like an art. Instead of having the inline function perform a calculation or even limit rows with a WHERE clause with an IN list, you could build a virtual table of special statuses in your inline function.

This is yet another example of using an inline function without sacrificing performance.

Hide and No Seek

So no, user-defined functions are not the devil. Scalar user-defined functions can cause big problems if misused, but generally inline user-defined functions do not cause problems.

The real rule of thumb is not to avoid functions, but rather to avoid adorning your index fields with logic or functions. Because when you hide your intentions from the optimizer with complex syntax, you risk not getting the better performing index seek.

One of the many uses of tally tables is to blow up some rows. Normally, you make your database reduce rows by joining tables to others and applying criteria. But sometimes you want it to make more rows.

Phone Home

Imagine a simple table containing contact information. It has two places for phone numbers (Phone1 and Phone2). Let’s say you wanted to split those off into a related phone numbers table so you could support many more numbers for a single contact. As part of that process, you’d need to create two rows for each existing contact– one for each phone number.

One way to get two phone numbers is to handle each phone number as a separate process, combining a list of all contacts and their first number with those same contacts and their second number. So you’d SELECT everything from the table once for the first phone number and then UNION it with another SELECT. This works, but if you have a large list of columns or a complex set of JOINs, you end up repeating a lot of code.

Instead, what if you simply, magically double the rows and then pick and choose the columns you wanted from each of the rows? You’d pick the first phone number for the first copy and the second phone number for the second copy. So what’s the magic? The Cartesian product.

Criss-Cross Applesauce

Normally, when you JOIN two tables, you want to correlate them somehow. If you had contacts in one table and phones in another, you’d want to make sure the phone numbers in your final query correlated to the relevant contact. The contact name might be repeated for each phone number, but it’s not just willy-nilly. A Certesian product is (seemingly) willy-nilly.

If you took a table with “A” and “B” and combined it with another table of “A” and “B”, you’d often want to match those values up. You’d take a query like this:

That’s because you’re correlating the values in one table with the other. What if you removed that correlation? Well, you’d get an error if you just removed the JOIN condition altogether, so let’s just fake its removal by making it a tautology like 1 = 1:

Kaboom!

So you can use a CROSS JOIN to get a list of everything in one table combined with everything in another table, all willy-nilly. If that “other table” is just a derived table of numbers (or a materialized tally table), you get as many copies of you original table as you have rows in your table of numbers. So if we want 2 copies of our table, we can just cross it with a tally table with two rows:

Just in CASE

For our phone number splitting, we don’t just want a copy, of course. We’re copying with a purpose. We want to fiddle with each copy, grabbing a different phone number each time. When we’re looking at the first copy, we want the first phone number (Phone1); or else we can get the second (Phone2).

Let’s demonstrate by populating our simple contact table and then use two different methods to gather up a list of contacts with each phone number in a separate row.

Eat that cake!

In this short example, the code explosion method looks about the same size as the UNION method, but it scales linearly with additional columns to split, whereas the UNION method would scale geometrically, duplicating the full list of columns– and the full FROM clause– for each iteration.

But wait, there’s more. In addition to being more concise code, with less error-prone duplication, it also probably performs better. In this example, if you use SET STATISTICS IO ON, you can see that it uses half as much IO:

When talking about bitwise math, I needed to show the inverse of a string of ones and zeros. Instead of doing it manually, I wanted to get a computer assist. Since I always have SQL Server Management Studio open, T-SQL is usually the fastest way for me to use some CPU cycles.

Swing Variable

In a typical, procedural programming language, you can’t execute a variable swap in a single statement. You need one step to set your first variable and a second step to set your second variable. But by the time the second step runs to set your second variable, the original value of the first variable is lost.

DECLARE@achar(2)='WI',@bchar(2)='MN';SET@a=@b;/* At this point, both @a and @b store "MN". "WI" is lost. */SET@b=@a;/* This is basically a NOOP. @b is already the same as @a. */SELECT'@a'=@a,'@b'=@b;

Results:

@a

@b

MN

MN

You might think you could be tricky and perform the assignments “at the same time” by putting them on equal footing in a single SELECT statement. You might, Rabbit, you might.

To solve this, you typically introduce a third, “swing” or temporary variable.

DECLARE@achar(2)='WI',@bchar(2)='MN',@cchar(2);SET@c=@a;/* Put @a's value in a safe place. */SET@a=@b;/* At this point, both @a and @b store "MN". But "WI" is safe in @c. */SET@b=@c;/* Now we retrieve "WI" from @c. */SELECT'@a'=@a,'@b'=@b;

Results:

@a

@b

MN

WI

Tabular Derivation

There are mathtricks to avoid the extra variable, but generally they’re confusing and probably slower than just using the temporary variable.

But T-SQL affords you some syntactical tricks of its own that allow you to create swing variables– indeed, swing tables– on the fly, without separate variable declarations and assignments. A derived table can expose the original variables’ values to be used in assignment, and they can be referenced even “after” they’ve been assigned (in the course of the same SQL statement).

Alternatively, we could use a tally table to break it apart based on the character position. But that’s a lot of code, and then we’d have to put the rows back together into a single string when we were done.

Any of that was going to take more time than just manually swapping characters, so I wanted something even easier. One’s first instinct might be to just replace the ones with zeros and zeroes with ones. But doing that directly, in either order, results in a ruined string of all one type:

This was fundamentally the same problem as our variable swap. It don’t mean a thing if it ain’t got that swing. We need a safe place to store the original data while we make our changes.

Swing Character

When dealing with a single string, the only way to make a safe space for us to store our replaced characters is to use a character we know would otherwise never be used in the string. This is the classic problem of choosing a delimiter. For a complex scenario, using really rare, unprintable characters as CHAR(1) might be necessary. But here, a simple letter “x” will do.

First, we switch all of one character to our swing character (say, each number one to an x). Then, we switch all of the leftover zeros to ones. Then we retrieve the original ones locations from our safe space (“x”) and change those to zeros.

Now I could have probably found a complete programmatic solution in short order, but I wanted to try it myself. MPR was having their member drive, so I didn’t bother turning it on for my drive home. (I already have a monthly contribution.) Instead, I couldn’t stop thinking about how to model the game. I realized that it was many orders of magnitude easier than games like Chess, Checkers, and– gasp— Go, so it seemed like something I could tackle.

Model

My initial plan was to “brute force” all of the possible games. But first, I would need a way to represent the physical reality of the game as data. Obviously, I needed some representation of each square and whether or not it held an X, O, or nothing. “Normal” programmers would start thinking about two-dimensional arrays or other memory constructs. Me? I think about data tables and SQL Server.

While the game itself looks like a table, I wasn’t about to have thousands of tables, one for each game state. Even if that were technically feasible, you can’t dynamically access tables without dynamic SQL, and that’s a pain I didn’t want to subject myself to. So I needed a way to put all of the game states in one table.

Au Naturel

One way to store this information would look a lot like the game itself:

GameID

RowNum

Col1

Col2

Col3

1

1

X

O

X

1

2

O

X

O

1

3

O

X

O

2

1

X

O

O

2

2

O

X

X

2

3

O

X

O

However, that means any time I want to analyze an entire game, I’d have to load three rows of data into a specialized set of variables. And if we’re going that far, why not dispense with those pesky, fixed columns and make this extensible. We all love the entity-attribute-value (EAV) pattern, right?

EAVy Peavy

GameID

RowNum

ColNum

Value

1

1

1

X

1

1

2

O

1

1

3

X

1

2

1

O

1

2

2

X

1

2

3

O

Now we can support Tic-Tac-Toedecahedron– 4×4, 5×5, 6×6, whatever you have the compute power for. But it also means analyzing a game’s worth of data requires a maze of comparisons across rows to determine whether you have N in a row. And I was not planning on building a maintainable enterprise solution. Rather, I was playing a game– the game of seeing if I could make a game. So rather than representing this data vertically across multiple rows, I thought it might be faster and easier to code if we squished it down to one row per game.

Aside: I actually fully support the entity-attribute-value pattern for sparsely populated, frequently expanded optional data. A great use case is feature flags, where code only executes when an entity has a particular “flag”, defined by the presence of a particular attribute in the EAV table, perhaps with a particular value. One of the great advantages of feature flags is decreasing risks for new features, which is supported by not having schema changes which would be required for an extra column on an entity to store the feature.

This vertical row/column representation is also a good way to store imported data data with an arbitrary number of columns (say, when shredding and storing arbitrary text, JSON, or XML data). If you don’t know the columns in advance, you either have to create as many columns as you would ever need, or make it vertical so it can expand and adapt per row, as necessary. You can then pivot the table later for the specific horizontal grouping you’re interested in.

Horizontal

Ok, so if we’re only going to support 3×3, and we want to have everything in one row so we can grab it easily for processing, then we just need 9 columns, one for each cell in a game. And because I’m human, not a computer, I start counting at 1. (Narrator: he should have tried counting like a computer.)

Game board, labeled:

1

2

3

4

5

6

7

8

9

Flat game board table model:

GameID

Cell1

Cell2

Cell3

Cell4

Cell5

Cell6

Cell7

Cell8

Cell9

1

X

O

X

O

X

O

O

X

O

2

X

O

O

O

X

X

O

X

O

X = 1

Those of you with classical computer science backgrounds might recognize this as a collection of bits (X or O). There are some other things which are collections of bits– every integer ever. After all, (current) computers are fundamentally bit processors. Everything comes down to true or false. All digital data is just a really, really long series of ones and zeroes. We derive meaning by applying context to all those bits.

When we store numbers in binary format, we’re storing them in base 2. The number 7, for example, is “111”, which means it’s 1 (2 to the 0th power) + 2 (2 to the 1st power) + 4 (2 to the 2nd power). So “101” translates to 5. So any string of 1’s and 0’s can be construed as an integer. The more bits, the bigger the numbers. For our game, we have 9 bits, and every end game state can be represented by a string of ones and zeroes. (You can pick either X or O to represent as 1. We’re picking 1 for X because O looks like 0.)

So we can assign each of our 9 cells to the 9 bits in a 9-bit binary number. Binary numbers are typically written with the less significant digits to the right. So if we labeled each of our bits, it would look like this “987654321”. The first game listed above has an X in the 1st, 3rd, 5th, and 8th cells. So let’s replace those bits with “X”, yielding “9X76X5X2X”. If we assign 1 to X, then our binary number ends up being “010010101”. Now, that works just fine, but then we have to remember that our 1st cell is represented by 2 to the 0th power and our 9th cell is represented by 2 to the 8th power. That seems like a recipe for off-by-one errors (one of the two big problems in computer science). If we’re willing to throw another bit into the mix, we can represent our cells as “9876543210”. We’ll just never put anything in the 0th cell/power. (Maybe we should have counted from 0.)

With that scheme, our first game is encoded as “0100101010”. You can figure out that’s 298 in decimal either using your calculator app or your favorite programming language, like T-SQL:

SELECTPOWER(2,1)+POWER(2,3)+POWER(2,5)+POWER(2,8);

So by translating these 9 separate bits into a single integer, we’ve reduced our 9 table columns to 1. But we’ve made something which was visually easy to understand into something inscrutable to humans. And depending on your database engine, you might actually be using more space. For example, in order to have numbers up to “1111111111” (1023 in decimal), we need to use a numeric or smallint, which SQL Server stores with a minimum of 5 or 2 bytes, respectively. 2 Bytes is 16 bits, which is more space than our 9 bit fields would have required. (There is some column overhead, but I still think the bit fields would be less space.) So why would we bother?

Zoro!

In order to figure out if someone won our game, we need to check if there are three in a row of the same player’s mark. None of the storage mechanisms we’ve considered have any native understanding of “in a row”. For any game state, we’d end up checking if a one particular row and column was X and then another, adjoining row and column was X, and then another, etc. You’d select the winning games with something like “(this and this and this) or (this and this and this) or…”. That gets messy fast.

Imagine if you had a bunch of cardboard cut-outs you could put over your game which would expose only the squares required for a win. The cardboard would “mask” the squares you didn’t care about so you could see just the potentially winning values.

Game

Cardboard cut-out

Result

O

X

O

O

O

X

X

X

X

X

X

X

It turns out there’s a concept called bitmasking which can work a lot like this cardboard cut-out process. (Props to Dylan Beattie for his quick visual demonstration at NDC Minnesota which drove this point home.) First, you represent your game state with a bunch of bits (“OXOOOXXXX” yields “0100011110” for our example above, remembering that we’re padding that last 0 just to make the powers 1-based instead of 0-based) and then you represent your winning state with a bunch of bits (“0000001110” for our example winning state here). Now you use the magic of “bitwise math” to compare the two.

For our use, we want to find out whether our mask exposes the winning three bits. We want to block everything else out. With bits, to check if both items are true, you use “AND” (0 and 0 is 0; 0 and 1 is 0; 1 and 1 is 1). If we apply that “AND” concept to each bit in our game, it will squash out any values which don’t match. If what we have left matches the mask (fills in all of the space we can see through), then we have a match and a win.

Game to analyze:

0

1

0

0

0

1

1

1

1

0

Winning mask, applied with AND:

0

0

0

0

0

0

1

1

1

0

Result, which matches:

0

0

0

0

0

0

1

1

1

0

Most programming languages have this capability built-in. In T-SQL, the “bitwise AND operator” is the ampersand. If we convert our example binary game state of “0100011110” to decimal, we get 286. For our mask, “0000001110”, we get 14. We see that we get a matching result of 14 when we compare them in T-SQL using an ampersand:

SELECTCASEWHEN286&14=14THEN'Win!'ELSE'Not a win'END;

Result: Win!

Tic-Tac-Tie

If we make a small adjustment to our example ending game state, swapping the 5th and 8th cell contents to yield “0100111010” (decimal 314), we’d have a tie.

Game to analyze:

0

1

0

0

1

1

1

0

1

0

Winning mask, applied with AND:

0

0

0

0

0

0

1

1

1

0

Result, which does not match:

0

0

0

0

0

0

1

0

1

0

We can verify this with T-SQL:

SELECTCASEWHEN314&14=14THEN'Win!'ELSE'Not a win.'END;

Result: Not a win.

Hugs or Kisses?

It would be nice if we could just presume O won when X didn’t win, but as we just saw there are those pesky ties, where nobody wins. So we have to specifically check if O won.

Absolutely Nothing

Well, if O won, then when we look through our mask, we should see only O values– that is, all zeros. So when we perform a “bitwise AND” on our game state and our winning mask, we should see all zeros.

Game to analyze:

1

0

1

1

1

0

0

0

0

0

Winning mask, applied with AND:

0

0

0

0

0

0

1

1

1

0

Result, which equals 0:

0

0

0

0

0

0

0

0

0

0

We can also verify this with T-SQL:

SELECTCASEWHEN736&14=0THEN'Win!'ELSE'Not a win.'END;

Result: Win!

If we use this same methodology with the tie game above, we can see that applying the winning bitmask not only doesn’t result in a match (indicating an X win), but it also doesn’t result in all zeros (indicating an O win).

Don’t be so negative.

There’s actually another method for checking for an O victory. Instead of looking for all zeros with our match, we could first flip all of the bits, essentially remapping O to 1 and X to O. Then we would apply the same logic as we did for an X victory, where we checked that the bitwise AND with the winning bitmask resulted in a match.

Game to analyze:

1

0

1

1

1

0

0

0

0

0

Flip all the bits, to check O:

0

1

0

0

0

1

1

1

1

1

Winning mask, applied with AND:

0

0

0

0

0

0

1

1

1

0

Result, which matches:

0

0

0

0

0

0

1

1

1

0

It’s pretty easy to do this in T-SQL as well, thanks to the bitwise NOT operator, a tilde (~). We take our original decimal value for our O-winning game, 736, flip its bits with the tilde, and then check to make sure it matches our winning bitmask (decimal 14).

SELECTCASEWHEN(~736)&14=14THEN'Win!'ELSE'Not a win.'END;

Result: Win!

Can’t you see the sign?

Be careful, as this method can get pretty confusing if you dig underneath the simplified veneer. Remember our O-winning game, “1011100000” (decimal 736). If we flipped those bits, we get back to an X-winning game, “0100011111” (decimal 287). (It’s not quite the same as before, because we had set that least significant bit to 0 because we’re ignoring it and it doesn’t affect the bitmask checks. Thanks 1-based counting.) So you’d expect the inverse of 736 to be 287. But it’s not. It’s -737. Wha?

T-SQL deals with numbers which are of certain types. Here, we’re dealing with plain numbers, without specifying a data type. But SQL Server still assigns a data type to work with them. In our case, it’s an integer (SELECT SQL_VARIANT_PROPERTY (736, 'BaseType') results in int). An integer is 4 bites, which is 32 bits, so our “1011100000” is actually “00000000000000000000001011100000”. Flipping all those bits results in “11111111111111111111110100011111”. You might think that’s a really big positive number, but SQL Server uses that most significant (left-most) bit to store the sign (positivity or negativity) of the number. Ok, so you might think that’s a really big negative number. But wait, there’s more. SQL Server uses the two’s complement method of storing signed numbers in binary. It’s kind of like counting backwards for the negative numbers.

A nice artifact of the choice of two’s complement is that the decimal value of the negative number derived from flipping the bits of a positive number is not affected by the size of the signed variable (smallint, int, bigint). All of those extra leading zeroes for the bigger variable types are still flipped to ones, which would normally make the negative number even bigger, but because of the backwards counting, it’s always the same relatively small number.

So because of two’s complement, all of these return decimal -2:

SELECT~CONVERT(smallint,1),~CONVERT(int,1),~CONVERT(bigint,1);

And using this T-SQL, we can see that the trailing 0 we fudged with (0 originally and then 1 when we manually bit-flipped) as no bearing on the results, because it’s not part of this– or any– winning mask.

SELECT
ManuallyFlippedBits =287&14,
OriginalWithTrailing0 =286&14;

O boy.

So to determine if O won a game, we can either use the zero-match method (use AND and look for all zeroes) or the bitwise-NOT method (flip the bits, then use AND and check for a full match as with X).

This way and that.

Wait, all that, and we’ve only checked for one winning condition (as it happens, across the bottom row). But there are actually 8 winning methods– all three rows across, all three columns, and then the two diagonals. So to calculate the overall winner for a game, we need to check the game’s bitmask against each of the eight winning bitmasks for X, and then if none match, we check for O (using whichever method), and then if none match, it’s a tie.

Tic-Tac-D’oh!

Ok, so we think we have a good way to record our game states– as a bitmask where the bits represent “X” when they’re on and “O” when they’re off, and each bit corresponds to one of our labeled squares.

And we think we have a good way to detect whether X or O won the game– we compare the game’s bitmask to the winning bitmask and check if there’s a match for X or an inverse match for O.

So tell me, who won this game?

O

X

X

O

O

O

X

X

X

It’s a trick question. You can’t.

In some sense, it might not even be a valid game state. After all, if we define the game as ending as soon as there are three-in-a-row, then we can have finished games which don’t have all of the squares completed. That’s a big problem for our data model. Because we’re storing bits in a bitmask, they have only two values, true or false. There’s no concept of “not specified” (NULL). We’d have to go back to bit fields for that, since in SQL Server they can have a NULL state. (That’s thanks to some of that extra columnar overhead hinted at earlier, since there’s a NULL bitmask in the row storage mechanism.)

But even if we didn’t say the game ended as soon as there were three in a row, and play continued until every box was filled, you still don’t know who won this game, because you don’t know who got three-in-a-row first. Even if you assume that X always goes first, you can’t assume that X was the first to get three-in-a-row. Yes, it’s very likely, but we can’t presume perfect players when we’re building a system to manage game-play.

So any storage system for this game must include a concept of turns, so that we know how the game looked throughout time and can detect the first point at which a player achieved three-in-a-row. (Ideally, it would also prevent invalid game states from being stored, such as a board with all X or all O plays.)

So what we have built simply won’t do.

But why!?

I do have a solution which I think works, but obviously this isn’t it. I plan to share that in a follow-up post.

So why did I go through all of this explanation if ultimately it was unsuccessful? Several reasons.

First, this is really how I thought this through originally. And I think showing skilled failure helps others realize that failure isn’t a catastrophe but rather a learning experience. (Unless, of course, you’re talking about pacemakers or bridge safety systems, etc.) By “skilled failure”, I don’t mean being sloppy or careless, but rather applying your skills in a reasonable manner and considering each decision. Even by doing everything “right”, in the context of your knowledge at the time, you can still be “wrong”. And that’s ok, because now your knowledge has expanded. Exposing this process shows that even those who have been around the block a few times still make wrong turns. Besides, it’s all about the journey, right?

Second, breaking apart this background from the solution and its explanation allows me to explain some of the concepts (like the storage mechanism choice and bitmasking) in a simpler context. And the turn-based game model is more complex, so the more you understand going into it, the easier it will be to follow.

Third, planting these seeds might cause some ideas to grow in readers’ heads. You might get more benefit thinking about possible solutions without seeing what I came up with.

Fourth, I think having the background and solution all in one post might make it so big that no one would bother reading it. And having this background separate gave me the space to indulge in a little more detail and some asides which might have been overwhelming in an even bigger combined post.

Fifth, this allows me to provide something which has some inherent value before the rest of the solution and explanation is ready for publication. (And I can show some results for the work which has gone into this.)

Finally, breaking things at this point provides a little teaser for the next post where I will show (what I believe to be) a working solution. (It also forces me to write the rest and not abandon the idea, because I’ve made a public commitment.)

Same bat channel…

So the next post on this subject will contain a working game storage mechanism, a function to determine the result of the game (X win, O win, or tie), and even some primitive artificial intelligence and some really simplistic machine learning, all done in T-SQL.

Last week I asked the Microsoft data professional community to give back… and you did. We generated 29 blog articles with inspiring stories of past contributions, advice to those beginning their journey, and plans to do more. A recurring theme was how helping others ultimately helps you– with both personal growth and professional recognition. My overarching goal was to get more people doing more things for the community. I hope the tales and goals shared this month provide the motivation for more action and continued growth of our community.

Topic Selection

I must admit that I almost made a big mistake with my topic selection. I was so blinded by my vision of new bloggers, speakers, and mentors that I didn’t realize the hubris of asking some folks to do even more. I neglected to consider that that many T-SQL Tuesday participants are already engaged, active members of our community– MVPs, prolific speakers and bloggers, and even literally legendary Q&A contributors.

I had provided an escape value– a good topic in its own right– of a favorite SQL Server 2017 feature. That helps everyone realize the benefits of upgrading, which keeps our ecosystem current and generates revenue for Microsoft to continue improving the products we all use. But I’d missed an obviously better choice– something thematically related but tailored to industry veterans.

Instead of asking even more of those already giving of so much, why not give them an opportunity to inspire the next generation, guiding and encouraging them to make their own contributions? I’m glad that I listened to my nagging doubt and updated the post the next morning with that topic tweak. And I’m grateful that many approached this month’s event from that perspective.

Process

Just gathering and compiling a list of everyone’s post was more work than anticipated. In addition to those who commented on the original post, the #tsql2sday hashtag had a few originals mixed in. I hope I’ve found everyone; if not, please leave a comment and I’ll fix any mistakes. (Side note: I don’t know why, but many bloggers don’t have their full names and ideally Twitter handles prominently listed.)

Initially, I had planned on a comprehensive grid with dates, sources, titles, topics, and individualized comments. But wide tables look bad and function poorly on mobile. So after building all of the HTML, I decided to pare it down. Rather than more manual editing, I thought I’d leverage our favorite tool, SQL Server, to do my work for me.

HTML is basically XML, especially something as simple as a table, and SQL Server can parse XML. SQL Server can also make XML, which is basically HTML. Put those together and T-SQL is a general purpose HTML table reformatting system. So I copied the HTML straight from the WordPress editor and pasted it into SSMS. Then I encoded single quotes as double single quotes (using find and replace) and put it in a string. Using the XML nodes function provided access to each table row as a SQL table row, and the XML values function allowed extraction of the various URLs and data elements.

Once I had the data represented in a derived table, I could sort and manipulate it, and then gather it back up into XML using FOR XML. The results can be pasted back into WordPress and saved– instant table in a new format, sorted as desired. I was even able to use SQL Server’s recent AT TIME ZONE function to ensure that (almost) everyone posted when it was “Tuesday somewhere”. With some windowed aggregates, we can also see that 22 authors commented while 7 just tweeted. And using my quick categorization of the posts, I verified the need for the alternate take on the topic, since almost 60% (17 of 29) gave some form of advice or personal history.

“Blogging helps you become a better technical person.” Plus it saves time in the long run; write once, read many. Speaking makes you double-down on prep, which cements your learning and makes you a better presenter. Helping others also enhances your personal life.

Our #sqlfamily vlogger talks about moving to video to save on travel costs. He aims for the niche of short, entertaining tips. Making them has been fulfilling and rewarding. Don’t let fear stop you. “If you give it your best then the SQL Server community won’t let you down.”

Cláudio began learning about SQL Server through SQLServerCentral and print magazines. Then he explored other blogs, then user groups and SQLSaturdays. With encouragement, he began speaking and then working with dbatools.

“We all love to give back and help each other.” Her laptop wouldn’t cooperate for her first demo-laden presentation, but she gave it anyway. And then again the next week. She’d been “bitten by the speaking bug” and is now a mentor. She also blogs and is riding the SQL on Linux train.

Peter details the long history of helping in our community and notes that he has’t “seen this level of sharing in many other tech communities”. He blogs, speaks, and contributes on StackOverflow and SQL Slack. He encourages every else to get involved: “you probably know something that nobody else does”.

A 23-year MVP, Steve got his start in Microsoft Access, answering questions on CompuServe. He eventually migrated to SQL Server and then System Center Configuration Manager. He explains that speaking is fun and benefits your career.

Like so many others, a local user group and SQLSaturday were Björn’s introduction to PASS. He started volunteering at conferences, founded a meetup, and eventually spoke at a SQLSaturday. He also confirms that speaking is fun and helping people makes you happy.

We already knew Aaron was legendary, but Stack Exchange confirmed it for us. He’s been answering questions online since before that was a thing. Then he started blogging for efficiency and reach. He focuses on best practices to “help the helpers”, hoping to spread good habits. Ultimately, as with so many others, he’s “just trying to help people”. And he wants you to know you can, too.

Kathi describes how PASS Summit 2004 changed her life forever, prompting her to immediately sign up to volunteer for PASS. She didn’t realize how much she was helping others or how it would open doors for her. But now it’s clear she’s “making a difference in the world”. She says you’ll be glad when you, do, too.

Allen started getting involved in PASS by bringing knowledge back from Summit to his local community. Eventually he led that local group and about a decade later he’s part of the global PASS leadership. He echoes that “the more you give, the more you get in return”.

First time participant Tim explains how he learned about SQLSaturdays at Summit and shortly thereafter attended one, then spoke at one, and then organized one. He continued volunteering, blogging, running a user group, and was hired at SQLskills. Hearing that he’s helped someone makes all the hard work worth it.

Nate blogs to help himself remember, in addition to helping the community. He’s attended SQLSaturdays and hopes to volunteer in the future. He also hosts “lunch and learn” events at his company. He reminds us that having to teach something forces you to learn it really well. Finally he cautions that, while being generous with your time really does make a difference, don’t forget to prioritize yourself and your family.

Andy claims he consumes more than he contributes, which is striking since he’s been an author of 11 books, has built open source SSIS utilities, blogs, and speaks at SQLSaturdays. He’s lived the “virtuous cycle” in the SQLServer community.

Deb plans to volunteer to help first timers at PASS Summit. She’s also considering submitting to speak at a virtual group or GroupBy.org. She emphasizes that it’s ok to help behind the scenes or in little ways. There’s a way for everyone to help.

Kevin is taking mentorship to a new level by running an apprentice program. He’s looking for his next recruit, especially among folks who don’t have the same opportunities as many in the tech industry.

Shane has been blogging weekly, is a contributor to dbatools, and has volunteered at SQLBits. He plans to contribute more to open source tools, submit to speak at SQLSaturdays, and write deeper technical blog posts.

Another first time contributor brings us another new website. Janice rekindled her local user group, organized her first SQLSaturday this March, and began speaking. She’s planning on submitting to more speaking opportunities this year and she will continue her work organizing the PASS Professional Development virtual group.

Originally worried about not having “anything worth sharing”, Jess is now convinced we all have something to teach. She’s committing to 6 more blog posts this year (3 technical), 50 merged pull requests into the dbatools module, and to present at a user group meeting and (hopefully) a conference.

PROJECT SQLImaginarium has officially launched. Todd will use virtual reality to introduce folks to SQL Server. It will provide an immersive experience and will enable users to visualize how the engine works in ways never before possible. But he needs your help to make it work.

Nick was worried about having something to contribute which hadn’t been done before, but then he realized that “[e]veryone has a voice, and my experiences working with SQL Server can’t be the same as everyone else”. He plans to blog at least twice per month, host a lunch and learn at work, and speak at a public event.

Plan

Backwards and Forwards

We’ve heard stories of growth– from answering questions privately, to answering questions publicly, to blogging for the world to see. We saw folks get started by teaching their coworkers, then their local professional community, and then conference audiences small and large. We watched as they built the tools many of us use every day.

We were reminded that everyone has something to share. We all have a unique perspective to offer. And no contribution is too small.

We’ve been shown that we grow as we help others grow. And that, in addition to being fun, helping others helps ourselves.

Many of us have been helping for years, and some of us are just getting started. Already in response to this invitation, we’ve seen two first time contributors, two new websites, a new apprentice program, and a SQL Server virtual reality project. Together, we can really take this community to another dimension. (Get it? 3D?)

I’m excited to see what we can all accomplish together.

Thanks!

Thank you, everyone, both for all you’ve done and will do for each other and our community, and for making this T-SQL Tuesday a success.

Update 2018-05-23: In reviewing my comment spam queue, I noticed Andy Leonard’s entry today and added it to the list. Sorry for the initial omission.

This month I asked the #tsql2sday crowd to make a plan to give back to the Microsoft data platform community. For those who already give so much, I suggested describing how they got started in hopes they would inspire others. I hereby submit my plan.

Their bigger 2500-person conferences have $400k food bills, so they’ve described their 350-person debut here to be more “intimate”. (That’s not how I would have called our similarly-sized SQLSaturday.) They believe in the future of this market as they’ve already booked for next year and are counting on growth. At $1000 for two days, they recognize that they’re more expensive than most conferences in our region. But they bring in national and international speakers, and they believe the market will recognize the value they bring.

I went to their introductory NDC Minnesota Micro event last month. I appreciated that it was free and I recognized the diversity of the speakers they chose to highlight. More conferences and technical community growth is a good thing for our state, so I’m pleased to be volunteering to help make their inaugural event here a success.

While there, I plan to meet new people and think about ways I could help them. For example, during the volunteer meeting last night, I spoke with several junior developers. I shared information about our local junior developer group (which many already knew about, having recently graduated from Prime Digital Academy). I also talked about CoderDojo Twin Cities and That Conference. You never know when you will help somebody make a connection which can influence their career development.

Improve Diversity in Tech

I’ve blogged several times and spoken about the challenges we face with improving diversity in our technical community. Big companies like blaming problems on the “pipeline” because spending money on encouraging kids in STEM makes for good PR. But the real problem is the flight from the persistent toxicity in tech toward underrepresented groups. There’s also a negative feedback loop, where more departures lead to more feelings of isolation, which drives more talented people away.

To help combat this, I’m going to continue to work on checking my privilege as a white male in tech. And, because representation matters, I’m going to encourage women and people of color in tech however I can. Specifically, I will work to get at least one new speaker from an underrepresented group to present this year at our user group or SQLSaturday, providing whatever support they might need. (If you’re interested, please contact me!) Furthermore, I will work with our user group board to host a panel on diversity in technology– featuring diverse members, who also are encouraged to give technical talks– at our SQLSaturday on October 6th.

Serve on a User Group Board

This is the final year of my second, two-year term on the PASSMN board. I love being involved and I would happily run for another term, but I also want to make sure that we hear new voices in our leadership circle. So toward the end of the year I will be lobbying our group for a member to step up to serve for the first time. If there’s still room for me, I’ll gladly return. If not, I’ll look into reinvigorating the Rochester PASS chapter.

Blog

I really love speaking– once I’m actually up there and in the groove. It comes through to the audience, who describe my “energy” and “passion”. But honestly I still hate the lead up– the looming deadline to finish slides and preparation. The other problem with speaking is that (except for the great work Brent Ozar is doing with GroupBy.org), it’s so transient. A few dozen people benefit once, but it’s a flash in the pan. A blog or video, though, can have some real legs; years later people can still learn from it.

Most of my writing (like this post) involves community-related topics. They’re important, and in my opinion more interesting to read, but I’d also like to contribute more actual tech. So I’m making a commitment to write at least two instructional blog posts before the end of the year.

You can, too!

That’s all there is to this, folks. Think of a way you want to be more involved, write down a concrete step in that direction, and put a date on it. Eat the elephant one bite at a time.