Unwed Numbers

Computer science has an elaborate hierarchy for classifying problems
according to difficulty, and the question of where Sudoku fits into
this scheme has elicited some controversy and confusion. It is
widely reported that Sudoku belongs in the class NP, a set of
notoriously difficult problems; meanwhile, however, many computer
programs effortlessly solve any order-3 Sudoku puzzle. There is
actually no contradiction in these facts, but there is also not much
help in dispelling the confusion.

Complexity classes such as NP do not measure the difficulty of any
specific problem instance but rather describe the rate at which
difficulty grows as a function of problem size. If we can solve an
order-n Sudoku, how much harder will we have to work to
solve a puzzle of order n + 1? For problems in NP, the
effort needed grows exponentially.

Most discussions of the complexity of Sudoku refer to the work of
Takayuki Yato and Takahiro Seta of the University of Tokyo, whose
analysis relates the task of solving Sudoku to the similar problem
of completing a partially specified Latin square. The latter problem
in turn has been connected with others that are already known to be
in NP. This process of "reduction" from one problem to
another is the standard way of establishing the complexity classes
of computational problems. Yato and Seta employ an unusual form of
reduction that addresses the difficulty of finding an additional
solution after a first solution is already known. In Sudoku, of
course, well-formed puzzles are expected to have only one solution.
Yato and Seta say their result applies nonetheless. I don't quite
follow their reasoning on this point, but the literature of
complexity theory is vast and technical, and the fault is likely my own.

When you lay down your pencil on a completed Sudoku, the thought
that you've just dispatched a problem in the class NP may boost your
psychological wellbeing, but the NP label doesn't say anything about
the relative difficulty of individual Sudoku puzzles. For that, a
different kind of hierarchy is needed.

Many publishers rank their Sudoku on a scale from easy to hard (or
from gentle to diabolical). The criteria for these ratings are not
stated, and it's a common experience to breeze through a "very
hard" puzzle and then get stuck on a "medium."

One easily measured factor that might be expected to influence
difficulty is the number of givens. In general, having fewer cells
specified at the outset ought to make for a harder puzzle. At the
extremes of the range, it's clear that having all the cells
filled in makes a puzzle very easy indeed, and having none filled in
leaves the problem under-specified. What is the minimum number of
givens that can ensure a unique solution? For an order-n
grid, there is a lower bound of n2 - 1. For
example, on an order-3 grid with fewer than eight givens, there must
be at least two numbers that appear nowhere among the givens. With
no constraints on those symbols, there are at least two solutions in
which their roles are interchanged.

Can the n2 - 1 bound be achieved in practice?
For n = 1 the answer is yes. On the order-2 grid there are
uniquely solvable puzzles with four givens but not, I think, with
three. (Finding the arrangements with just four givens is itself a
pleasant puzzle.) For order 3, the minimum number of givens is
unknown. Gordon Royle of the University of Western Australia has
collected more than 24,000 examples of uniquely solvable grids with
17 givens, and he has found none with fewer than 17, but a proof is lacking.

Published puzzles generally have between 25 and 30 givens. Within
this range, the correlation between number of givens and difficulty
rating is weak. In one book, I found that the "gentle"
puzzles averaged 28.3 givens and the "diabolical" ones 28.0.