Using your Head is Permitted

July 2008 solution

Part 1

Sudoku is known to be an NP-Complete problem, but reasonable Sudoku is not, and,
in fact, it does have polynomial solutions.

First, consider the main problem of the algorithm presented: it has to handle
too many sets explicitly. Let's begin by considering which sets we actually
need and which can be infered. Consider the following more compact description
of an "intermediate status" of the board. Regarding each of the
N2xN2 positions, we will maintain a list
of the i values we have still not eliminated from appearance in the
position. This set will be denoted xmax for the position
x, with ximax being an indicator
variable whose value is 1 if i is in xmax or
0 otherwise. ximax is a renaming of
what was formerly Ximax for
X={x}.

This new explicit description translates to the following implicit description
in the terminology of the previous algorithm:

For a set A that is a subset of a basic constraint C,
Aimax=0 if A is the
union of positions from which i has been eliminated.
In this case, (C\A)imin=1.
For all other sets, Aimax=1 and
Aimin=0.

There are two forms of deduction methods that can be used to eliminate further
elements from appearance in certain positions:

If A is a subset of a basic constraint, C, such that
|A| equals the number of elements that appear in the
union of xmax over all x in A,
then all positions in C\A cannot
contain any of the elements in this union.

If for any i and any basic constraints C and D, the list
of positions in C where i has not been eliminated is a subset of
D, i can not appear in any other position in D.

Lastly, we begin our path of deduction by reading the board. For every position,
x, known to contain i, we eliminate i from
ymax for every other
position, y, in the same row, column and/or sector as x.

It can easily be shown that putting all these rules together, any reasonable
Sudoku problem can be solved. The rules cover all the rules used by the
brute-force solution. This, indeed, is what is implemented by the following
given program. Its interface is very much
like that of the program given in the question, except its "explain" method
returns different explanations and, for obvious reasons, does not support
queries on sets of positions, but only on single positions. Here is a sample
session with the new program:

>>> from sudoku import *
>>> f=open("board.txt")
>>> board=f.read()
>>> f.close()
>>> p=sudoku()
>>> p.solve(board,False)
876315492
532489617
149276385
425967831
718543926
963128754
384752169
651894273
297631548
>>> p.explain("I4","4")
I4 is 4. This was discovered while checking ['4'] on column I and finding it only on I4.
>>> p.explain("I4","2")
2 is not on I4 . This was discovered while checking row 4 and discovering that E4 is 2.
>>> p.explain("G7","2")
2 is not on G7 . This was discovered while reading the board.
>>> p.explain("G8","2")
2 is not on G8 . This was discovered while checking row 8 and discovering that C8 is 2.

The way the new program works is by re-checking, whenever a possibility,
i, has
been eliminated from an xmax, all basic constraints, C,
that include x, trying to apply the two rules of deduction. It checks
for subsets X of C for which
|X| equals the number of elements that appear in the
union of ymax over y in X, then removes
i from all zmax for z in C\X.
It also checks whether the list of positions
where i has not been eliminated did not become, due to the recent
elimination, a subset of a different basic constraint, D, in which case
i should be eliminated from all zmax for z in
D\C.

These improvements make the new version much faster (it takes roughly 1 second
to solve a Sudoku board) but it's still hyper-exponential. The reason for this
has to do with the need to check every subset X of each C
checked. Though the number of basic constraints is polynomial, the number of
subsets is hyper-exponential in N.

Here is an algorithm description for a simpler-to-analyze version of the same
algorithm (though, admittedly, a slower one in practice):

1. Initialize the state by considering all elements as possible in each position.
2. Read the visible board positions.
3. For each visible board position, x:
3.1. Let i be the element in the position.
3.2. Eliminate i from the list of possibles in each of the three basic constraints that x is part of.
4. RecentChanges := True
5. While RecentChanges is True:
5.1. For each basic constraint, C:
5.1.1. Eliminate i from the possible list of position x in C if there exists a subset X of C such that
a. The union of the possible lists of the positions of X is of size |X|.
b. The union contains i.
c. X does not contain x.
5.1.2. For each i:
5.1.2.1. For each basic constraint D other than C:
5.1.2.1.1. If the list of positions in C that may still contain i is a subset of D:
5.1.2.1.1.1. Eliminate i as a possibility from all other positions in D.
5.2. RecentChanges := the list of possibilities has been updated during the current "while" iteration.

Clearly, the entire algorithm would have become polynomial had we managed to
implement line 5.1.1 in a polynomial manner. The rest of the
lines are polynomially computable, and the "while" loop must perform at least
one elimination every iteration, so the number of iterations is bounded by
the number of elements times the number of positions, which is polynomial.

The problem is that the given implementation of 5.1.1 enumerates over all the
possibilities for X, which is not polynomial. Luckily, there is a
different solution.

Consider the fact that 5.1.1 eliminates i from the possibility list of
x based on the fact that no assignment of elements to positions of
C that includes i in the
x position can be composed solely of elements listed as possible in
their positions, while still having each element exactly once. We can
therefore ask the question directly: is there a matching of the places to the
elements that matches each place with a possible element and has i in
the x place?

In this formulation, this is clearly a question of finding a perfect matching
(or proving none exists) in the bipartite graph with N2
elements on one side and N2 positions on the other, with
an edge between a position and an element indicating that the element is
possible in the position. In order to insure that i is matched with
x we can, for example, erase all other edges to i. Repeating
this for each i and each x (a polynomial number of repetitions),
the algorithm is complete, recalling that finding a perfect matching in a
bipartite graph is a problem with a well-known polynomial solution.

An implementation of a more sophisticated (but still polynomial) version of
this solution is provided here. For
N=3 it is slightly slower than the hyper-exponential solution provided
above, taking roughly 1.5 seconds to the other solution's one second, at least
on my computer. For any larger values of N, the polynomial solution
would, of course, be much faster. Already for N=4, the polynomial
solution takes only a few seconds, compared to several minutes for the
hyper-exponential solution.

Notably, the polynomial solution provided does not have any "explaining"
functionality, nor does it provide much of the board-consistency validation
that is present in the other versions given here.

Part 2

The fact that reasonable Sudoku problems have polynomial solutions while
general Sudoku problems are NP-Complete indicates that, unless NP=P, there
should be non-reasonable Sudoku problems. However, nothing promises that such
cases appear already in N=3.

Though the algorithms presented here for the solution of reasonable Sudoku
problems can not complete any of the missing digits of this board, simple
trial-and-error shows that the solution is unique. Trying to set D3 to anything
but "1" reaches an impasse very quickly. On the other hand, setting it to "1"
makes the problem once again reasonable, yielding the following unique solution: