Sudoku Solver

Today I am putting up something silly, just for fun. I set myself the task a while back to write a Sudoku solver in T-SQL, but with two dumb constraints that I would never follow given a real problem: I didn’t look at any documented techniques for solving Sudoku, and I specifically avoided T-SQL solutions, even though this has been done already many times. (The first thing I do with a real problem is to see who solved it already, and how, since most things have been done already. Not checking is a sure way to get inducted into The International Society of Wheel Re-inventors®.) So this solution is quite naive and probably has some gaps in it. I imagine it does the same thing as some other solvers, but, you’ll have to trust me, that’s a coincidence. Still, it gave me something to do evenings while my wife was doing … erm … Sudoku.

Anyway, the challenge I set for myself was to try to make a solver that:

Is completely set-based, in the spirit of both SQL and Sudoku.

Has as few statements as possible. I actually tried to write a single recursive CTE to solve a whole puzzle, but I didn’t quite get there, because I ran up against a limitation where you cannot reference a recursive member more than once in a CTE, and I’m not quite smart enough to solve that. I still have a hunch that it’s possible. The solution I do have consists of two statements with some flow control around them.

Should be reasonable to compute – that is, it’s not a brute-force method.

At a high level, the algorithm I came up with follows this logic:

The set of digits 1-9 is loaded into a numbers table “allValues.”

The set of all possible “cells” – all the boxes in the puzzle – is loaded into a second numbers-like table, “allCells.” This table just provides the address of each container / cell in the puzzle, to facilitate creating a left-join that will produce all the cells in a whole solution.

A puzzle is loaded into a third table, “puzzle,” which will start with the given values in the puzzle and ultimately hold the solution. This table is constrained with the same rules as the paper puzzle, so constraints enforce that it’s not possible to insert values into cells that break the rules of Sudoku.

A solver script is run against the tables:

While the puzzle is unsolved, repeat:

Compute the set of all possible values that can be inserted into all empty cells in the puzzle without breaking any rule

From that set, select any cells/values that are solvable because either there is only one possible value for the cell OR there is only one possible location for a given digit 0-9 in a row, column, or quadrant. That is, fill in cells that have deterministic solutions based on the existing values in the puzzle. (This usually is all that is required to solve an “easy” level puzzle, but won’t complete for a hard one.) Insert these deterministic values into the puzzle table.

IF there are no results from the prior step, that is, there are no solve-able cells, then:

Branch the solution by making a copy of it, so that there are two versions in the puzzle table. One ultimately will be correct, and one will be unsolvable.

Find one cell within the whole set of possible values that has two possible answers, and put one possibility into the first version, and the second possibility into the second version.

IF the insert of solved cells from step 2 failed, due to a constraint violation, then that implies that one of the branched versions of the puzzle in the table has reached an impasse: that is, it’s incorrect and unsolvable. Delete this version, leaving the correct one, and continue.

Setup

Designing the structure of the puzzle table was a little tricky. As we know, relational tables are not like actual grids because the order of both the rows and columns is undefined. All one can say about the values in a relation is that they are in the same row or in the same column, but there is no relative position as there is in a grid. So as tempting as it seems, we can’t represent the puzzle grid literally, shoving it into a table, because the positions of the cells in the paper grid carry meaning, while the positions of values in a relational table can’t. Basically, in SQL Server there is no “first” row or “first” column, and value cannot be said to be “next to” another value.

So, I arrived at a design that normalizes the data from the puzzle by creating a separate row for each cell in the whole puzzle, basically un-pivoting the grid of cells from the paper grid into a list. Each row in the resulting table stores an address for the position in the paper puzzle as Row, Column and Quadrant (the nine-square regions within the puzzle), with the digit in that cell as “Value.” The quadrant is obviously determined by the (row, column) pair and is deterministic, so that can be calculated using a computed column. The table definition looks like this:

CREATE TABLE dbo.Puzzle(
SolutionID tinyint NOT NULL,
Row tinyint NOT NULL,
Col char(1) NOT NULL,
Quadrant AS (
('Q-' +
case when Row BETWEEN 1 AND 3 then 'A'
when Row BETWEEN 4 AND 6 then 'B'
when Row BETWEEN 7 AND 9 then 'C' end )
+ case when Col BETWEEN 'A' AND 'C' then 'A'
when Col BETWEEN 'D' AND 'F' then 'B'
when Col BETWEEN 'G' AND 'I' then 'C' end ),
Value tinyint NOT NULL
)

Next, I added some rules to that table that make it mimic the “rules” implied by position in the paper version of the puzzle: no two cells in the same row, column, or quadrant can contain the same digit:

The allCells table is similar, but without the need for a SolutionID or Values column – it’s just a list of all the cells in one puzzle grid. The computed column for Quadrant is the same:

CREATE TABLE dbo.allCells(
Row tinyint NOT NULL,
Col char(1) NOT NULL,
Quadrant AS (
('Q-' +
case when Row BETWEEN 1 AND 3 then 'A'
when Row BETWEEN 4 AND 6 then 'B'
when Row BETWEEN 7 AND 9 then 'C' end )
+ case when Col BETWEEN 'A' AND 'C' then 'A'
when Col BETWEEN 'D' AND 'F' then 'B'
when Col BETWEEN 'G' AND 'I' then 'C' end ),
)

Note that the Puzzle table doesn’t contain “empty” cells, only solved cells, and it cannot contain invalid values. It might, however, have an unsolvable puzzle even though each value present doesn’t violate the rules – a fact that will become important later.

Viewing a Puzzle

Next, it’s important (and more fun!) to be able to view the puzzle in a grid, as it looks on paper. I struggled with this a bit until realizing that this is basically a pivot problem, and the pivot operator can be used to rotate the puzzle table back into the grid:

Only the non-null cells in the puzzle are present in the Puzzle table, but that set gets expanded via left-join to the allCells table to make a derived table with all 81 cells. That derived table is then pivoted on the columns to create nine columns by nine rows, and the resulting grid looks like the paper puzzle:

Sudoku Solver

At this point, we can check out the solver. The solver consists basically of two (evil, complicated) statements: one that can insert solved cells into the puzzle table, and a second that can detect and delete an unsolvable version of a puzzle. Those two statements are connected by flow control: a While loop that keeps the script going until the puzzle is solved, and a Try/Catch that handles the scenario where a version of a puzzle becomes unsolvable.

-- Loop while there is no solution having all 81 cells solved:
WHILE NOT EXISTS (
SELECT SolutionID
FROM Puzzle
GROUP BY SolutionID
HAVING COUNT(*) = 81
) BEGIN
BEGIN TRY;
-- Try to insert new solved cells into the puzzle table:
WITH
-- Every possible new value, for every empty cell in every puzzle version,
-- that does not contradict the solved cells already present in the puzzle table:
allPossibleNewValues
AS ( SELECT allSolutions.SolutionID ,
allCells.Row ,
allCells.Col ,
allCells.Quadrant ,
allValues.Value
FROM allValues
CROSS JOIN allCells
CROSS JOIN ( SELECT DISTINCT
SolutionID
FROM dbo.Puzzle
) AS allSolutions
LEFT JOIN dbo.Puzzle ON allSolutions.SolutionID = Puzzle.SolutionID
AND allCells.Row = Puzzle.Row
AND allcells.Col = Puzzle.Col
WHERE -- The cell isn't already solved:
puzzle.Value IS NULL
-- And there's no cell in the same row having the given value
AND NOT EXISTS ( SELECT *
FROM puzzle r
WHERE r.SolutionID = allSolutions.SolutionID
AND r.Row = allcells.Row
AND r.Value = allValues.Value )
-- And there's no cell in the same column having the given value
AND NOT EXISTS ( SELECT *
FROM puzzle c
WHERE c.SolutionID = allSolutions.SolutionID
AND c.Col = allcells.Col
AND c.Value = allValues.Value )
-- And there's no cell in the same quadrant having the given value
AND NOT EXISTS ( SELECT *
FROM puzzle q
WHERE q.SolutionID = allSolutions.SolutionID
AND q.Quadrant = allCells.quadrant
AND q.Value = allValues.Value )
),
-- There are two sets that give us new solved cells to insert into the puzzle:
-- SureThings are values for empty cells, where the existing values already solved
-- in the puzzle imply than only one value can go in the cell (there's literally only
-- one possibility for the unsolved cell, so it has to be correct):
sureThings
AS ( SELECT SolutionID ,
Row ,
Col ,
MIN(Value) Value
FROM allPossibleNewValues
GROUP BY SolutionID ,
Row ,
Col ,
Quadrant
HAVING COUNT(*) = 1
),
-- ForcedDigits are values determined by process of elimination: each digit 1-9 has
-- to find a "home" in every row, every column and every quadrant. That means that
-- if a value appears only one time in the set of all possible values for any row,
-- column or quadrant, then it has to be the solution for that cell:
forcedDigits
AS ( SELECT SolutionID ,
Row ,
Col ,
Value
FROM allPossibleNewValues n
WHERE
-- There is no other location in the same row that can have the given value
-- IOW, the given value must go in a cell, because it cannot go anyplace else
NOT EXISTS ( SELECT *
FROM ( SELECT SolutionID ,
Row ,
Col ,
Value
FROM allPossibleNewValues rows1
UNION ALL
SELECT SolutionID ,
Row ,
Col ,
Value
FROM dbo.Puzzle rows2
) AS [rows]
WHERE [rows].SolutionID = n.SolutionID
AND [rows].Row = n.Row
AND [rows].Col != n.Col
AND [rows].Value = n.Value )
OR
-- There is no other location in the same column where the given value can go
NOT EXISTS ( SELECT *
FROM ( SELECT SolutionID ,
Row ,
Col ,
Value
FROM allPossibleNewValues cols1
UNION ALL
SELECT SolutionID ,
Row ,
Col ,
Value
FROM dbo.Puzzle cols2
) AS [cols]
WHERE [cols].SolutionID = n.SolutionID
AND [cols].Row != n.Row
AND [cols].Col = n.Col
AND [cols].Value = n.Value )
OR
-- There is no other cell in the same quadrant where the given value can go
NOT EXISTS ( SELECT *
FROM ( SELECT SolutionID ,
Row ,
Col ,
Quadrant ,
Value
FROM allPossibleNewValues quads1
UNION ALL
SELECT SolutionID ,
Row ,
Col ,
Quadrant ,
Value
FROM dbo.Puzzle quads2
) AS [quads]
WHERE [quads].SolutionID = n.SolutionID
AND [quads].quadrant = n.quadrant
AND ( [quads].Col != n.Col
OR [quads].Row != n.Row
)
AND [quads].Value = n.Value )
)
INSERT INTO Puzzle
( SolutionID ,
Row ,
Col ,
Value
)
-- New solved cells, if there are any
SELECT SolutionID ,
Row ,
Col ,
Value
FROM (
SELECT SolutionID ,
Row ,
Col ,
Value
FROM sureThings
UNION
SELECT SolutionID ,
Row ,
Col ,
Value
FROM forcedDigits
) as solutions
UNION
-- Otherwise, branch by duplicating the existing solution as (solutionid + 1) plus
-- insert one trial cell value into the existing solution, and another trial value
-- into the new solution:
SELECT SolutionID ,
Row ,
Col ,
Value
FROM (
-- A copy of the existing version of the puzzle:
SELECT ( SELECT MAX( SolutionID ) from dbo.Puzzle ) + 1 as SolutionID,
Row ,
Col ,
Value
FROM dbo.Puzzle
UNION
-- One each of two possible values for a cell - one directed to the existing
-- solution and one to the copy, by Row_Number():
SELECT ( SELECT MIN( SolutionID ) FROM dbo.Puzzle )
+ ( ROW_NUMBER() OVER ( ORDER BY split.Row, split.Col ) - 1 ) as SolutionID ,
split.Row ,
split.Col ,
split.Value
FROM allPossibleNewValues split
INNER JOIN (
SELECT TOP ( 1 ) Row, Col
FROM allPossibleNewValues
GROUP BY Row, Col
HAVING COUNT(*) = 2
ORDER BY Row, Col
) as firstPair on split.Row = firstPair.Row
and split.Col = firstPair.Col
) AS copyPuzzle
-- Only in cases where there were no new solved cells
-- in sureThings or forcedDigits
WHERE ( SELECT COUNT(*) FROM sureThings ) = 0
and ( SELECT COUNT(*) FROM forcedDigits ) = 0;
END TRY
BEGIN CATCH;
-- If the insert above failed due to constraint violation, that means that
-- one version of the puzzle is contradictory/unsolvable.
-- Find and delete the solution at an impasse:
WITH
-- Every possible new value, for every empty cell, that does not contradict the
-- solved cells already present in the puzzle (same logic as above):
allPossibleNewValues
AS ( SELECT allSolutions.SolutionID ,
allCells.Row ,
allCells.Col ,
allCells.Quadrant ,
allValues.Value
FROM allValues
CROSS JOIN allCells
CROSS JOIN ( SELECT DISTINCT
SolutionID
FROM dbo.Puzzle
) AS allSolutions
LEFT JOIN dbo.Puzzle ON allSolutions.SolutionID = Puzzle.SolutionID
AND allCells.Row = Puzzle.Row
AND allcells.Col = Puzzle.Col
WHERE -- The cell isn't already solved:
puzzle.Value IS NULL
-- And there's no cell in the same row having the given value
AND NOT EXISTS ( SELECT *
FROM puzzle r
WHERE r.SolutionID = allSolutions.SolutionID
AND r.Row = allcells.Row
AND r.Value = allValues.Value )
-- And there's no cell in the same column having the given value
AND NOT EXISTS ( SELECT *
FROM puzzle c
WHERE c.SolutionID = allSolutions.SolutionID
AND c.Col = allcells.Col
AND c.Value = allValues.Value )
-- And there's no cell in the same quadrant having the given value
AND NOT EXISTS ( SELECT *
FROM puzzle q
WHERE q.SolutionID = allSolutions.SolutionID
AND q.Quadrant = allCells.quadrant
AND q.Value = allValues.Value )
),
-- ForcedDigits are values determined by process of elimination: each digit 1-9 has
-- to find a "home" in every row (same logic as above):
forcedDigits
AS ( SELECT SolutionID ,
Row ,
Col ,
Quadrant ,
Value
FROM allPossibleNewValues n
WHERE
-- There is no other location in the same row that can have the given value
-- IOW, the given value must go in a cell, because it cannot go anyplace else
NOT EXISTS ( SELECT *
FROM ( SELECT SolutionID ,
Row ,
Col ,
Value
FROM allPossibleNewValues rows1
UNION ALL
SELECT SolutionID ,
Row ,
Col ,
Value
FROM dbo.Puzzle rows2
) AS [rows]
WHERE [rows].SolutionID = n.SolutionID
AND [rows].Row = n.Row
AND [rows].Col != n.Col
AND [rows].Value = n.Value )
OR
-- There is no other location in the same column where the given value can go
NOT EXISTS ( SELECT *
FROM ( SELECT SolutionID ,
Row ,
Col ,
Value
FROM allPossibleNewValues cols1
UNION ALL
SELECT SolutionID ,
Row ,
Col ,
Value
FROM dbo.Puzzle cols2
) AS [cols]
WHERE [cols].SolutionID = n.SolutionID
AND [cols].Row != n.Row
AND [cols].Col = n.Col
AND [cols].Value = n.Value )
OR
-- There is no other cell in the same quadrant where the given value can go
NOT EXISTS ( SELECT *
FROM ( SELECT SolutionID ,
Row ,
Col ,
Quadrant ,
Value
FROM allPossibleNewValues quads1
UNION ALL
SELECT SolutionID ,
Row ,
Col ,
Quadrant ,
Value
FROM dbo.Puzzle quads2
) AS [quads]
WHERE [quads].SolutionID = n.SolutionID
AND [quads].quadrant = n.quadrant
AND ( [quads].Col != n.Col
OR [quads].Row != n.Row
)
AND [quads].Value = n.Value )
)
DELETE dbo.Puzzle
-- An unsolvable puzzle version at an impasse will have an impossible
-- result for forcedDigits. That is, two values forced into one cell,
-- or same values forced into the same row, column, or quadrant. Find
-- the unsolvable version of the puzzle to delete:
WHERE SolutionID = (
SELECT TOP ( 1 ) SolutionID FROM (
SELECT SolutionID, Row, Col
FROM forcedDigits
GROUP BY SolutionID, Row, Col
HAVING COUNT(*) > 1 ) as impasse1
)
OR SolutionID = (
SELECT TOP ( 1 ) SolutionID FROM (
SELECT SolutionID, Row, Value
FROM forcedDigits
GROUP BY SolutionID, Row, Value
HAVING COUNT(*) > 1 ) AS impasse2
)
OR SolutionID = (
SELECT TOP ( 1 ) SolutionID FROM (
SELECT SolutionID, Col, Value
FROM forcedDigits
GROUP BY SolutionID, Col, Value
HAVING COUNT(*) > 1 ) as impasse3
)
OR SolutionID = (
SELECT TOP ( 1 ) SolutionID FROM (
SELECT SolutionID, Quadrant, Value
FROM forcedDigits
GROUP BY SolutionID, Quadrant, Value
HAVING COUNT(*) > 1 ) AS impasse4
);
END CATCH;
END;

So far this has solved each puzzle I’ve fed into it. If I were a mathematician, I suppose I would be able to prove whether it can really solve any puzzle or not, but, alas, I am not that smart. Let me know what you think, or if you’ve tried this before what your approach was. And if anyone has the recursive, one-statement version I’d love to see it!

I’m attaching a ZIP with all the code for this solution. If you want to try it out:

Make a database called Sudoku

Run the DB Create.sql script to make the tables

Run one of the provided sample puzzle scripts to load a puzzle (one is an “easy” puzzle, one is an “evil” puzzle)

Run the Viewer.sql script to view the loaded puzzle

Run Solver.sql to solve the puzzle

Run Viewer.sql (perhaps changing the value of SolutionID) to view the solved puzzle

Geri - I would love to see it - can you post a link (looked at the front page of your blog but I'm afraid I don't even see where to navigate :-). You are right that mine is pretty slow - the CTEs do a lot of repetitive work in each iteration. A result of my obsessing about using one SQL statement only...

Love the lofty idea of an entirely set based solution. I like that sort of challenge! Jeff Moden would be proud :)

Just reading your high level algorithm and I don't like the branching idea.

From my experiences of playing Sudoku, I never get to an impasse where there are multiple solutions for a cell. Even if it takes ages to find it, there is always a definate answer for the next step. When I first started sudoku, I used that technique, essentially taking a bet on a cell. It inevitably ends up with you getting it wrong and then trying to wind back to a certain point.

There are about 12 rules for solving cells in sudoku and I think they can be implemented without resorting to the branching.

I think branching breaks your initial rule of no-brute-forcing. It sort of reminds me of using GOTO in procedural languages. I do think that a sudoku solver is not really suitable to a set-based solver, but that was the point of the excercise of course. I guess I can't talk until I try it myself!

I used the branching strategy with no shame whatsoever. In fact, my solution uses a number of GOTO statements. (yep that's valid T-SQL).

The good thing about exercises like this is that we put our own constraints on ourselves. Mine just happened to not match up with Merrill's. These constraints could be anything! Whether it's set-based solutions only or best performance or as Prof mentioned just now: no guessing.

Personally, when I wrote this I was going for a mix of performance and ease of development. Prof, you mentioned 12 rules. I knew of 3 or 4 easy-to-check ones and implemented those. Resorting to guessing (i.e. brute-force) afterwards.