Why Not Python?, Part 2

This time out, the old C hacker drags himself into the 1990s to solve Sudoku puzzles.

Before I get into the next program, I should mention
the
Python home page, which
offers recent versions
of the interpreter and a lot of helpful information about the language.
In particular, this
tutorial
is excellent. Still, I highly recommend getting a hold
of Learning Python by Mark Lutz and David Ascher (O'Reilly 1999),
which explains things way more thoroughly than the on-line tutorial.
You also may have the tutorial and other Python documentation on your
GNU/Linux box. My laptop has the tutorial located at

/usr/doc/packages/pyth_doc/html/tut/tut.html

as part of the Python documentation package, pyth_doc-1.5.1-11
in my case. Recent distributions may include both PDF and HTML
versions of the documentation, and you also
can download them here
or from a mirror.

Speaking of distributions, we need to think about the issue of compatibility.
I'm writing most of this article on a laptop running SuSE's Office'99
distribution, which includes this version of Python:

The on-line tutorial I mention above is much newer; it identifies itself as part of
Python 2.4.2. All the programs in this article have been run on
Python 2.4 (SuSE 9.3) and Python 1.5.1 (SuSE 5.3 - Office'99).

Okay, on to the next program.

Have you ever see a puzzle that made you want to write a program
to solve it? I had that feeling a few months ago, when I noticed
Sudoku in our local paper.

As you can see from the image above, the puzzle consists of a 9x9
matrix that is divided like a tic-tac-toe board into nine 3x3
sub-matrices. Each sub-matrix, each row and each column must
contain one of each of the digits from 1-9.

Data Structure

Writing a program to address this puzzle shouldn't be all that hard.
First, you need a data structure. You could number the cells from 0-80,
like this:

Or, you could make an array for each row, with an array containing
all of these row-arrays. You also could do it with columns. Or, you
could group each sub-matrix into a one- or two-dimensional array
and form a one- or two-dimensional array containing these sub-matrices.

All of these options are possible, but because we have to deal with
rows,
columns and sub-matrices, I decided to stick with a one-dimensional
array of cells, numbered 0-80. I also decided to use some other data structures
to handle the rows.

For each cell, we need to track:

the digit in the cell, if known

the set of possible digits that could be in the cell, if
the exact digit is unknown

We could have a data structure containing the known digits of all
the cells and another one containing the set of possible digits
of all the cells. Or, we could have a single array of data structures,
one data structure for each cell. Each one would contain all
the information about its cell, regardless of whether the digit is known
or unknown.

Something--call it programmer's intuition--gives me the feeling that
the latter option will make the coding easier.

So we're going to have an array, numbered 0-80. Each element in the
array is a data structure to tell us the digit, if known. If the exact
digit is unknown, the array tells us the
set of possible digits.
i

Algorithm

A data structure by itself isn't a program, however; we need to
operate on it. What the program must do, essentially, is:

read in the values of the cells specified and draw some elementary
conclusions about what the blanks will be

fill in the empty cells; that is, solve the
puzzle)

print out the answer

Back in the 1970s, this new-fangled thing called
"structured programming" talked about top-down design
and stepwise refinement. The idea was to break the big steps
into smaller pieces that could be coded confidently.

I like this idea, but I also like the new new-fangled thing called
"extreme programming", which says to write the test cases before you
write the code. So I decided to practice this idea on step #1. I fed a
puzzle to the code and checked a few of the blanks to make sure we
restricted the possibilities appropriately.

Step 1: Refining, Coding and Testing

Start by reading in the values of the cells specified and drawing some elementary
conclusions about what the blanks will be.
It might be cool if the program could read the newspaper and do
optical character recognition (OCR) to figure out the digits.
But even if it could, the OCR part still needs to communicate the
numbers to the problem-solving part. Let's restrict ourselves to
the problem-solving part and leave the OCR to the person behind
the keyboard.

So, let's have the program read the digits from a file. Rather
than doing anything fancy, let it ignore whitespace. Let each
digit stand for itself, and use - to represent a blank cell.
We also could allow . for blank cells, which is what I've seen
on a Sudoku message board. So the following would be valid input:

All of these input variations mean exactly the same thing. That is,
all of them represent the puzzle shown in the
image scanned from the paper.

Now, besides reading the characters, the program should fill in the
appropriate values in the data structures. Actually, before we even
start, we should
initialize each cell's structure to say the value is not known but
could be anything. In other words, the set of possible digits would be
{1,2,3,4,5,6,7,8,9}.

Then, as we read in the values, when we find a known one, we would
set the "if known" part to that value. We then would remove that
value from the set_of_possibles from the rest of that cell's row,
column and 3x3 sub-matrix.

The rows are numbered in the range (0,9), with cells 0-8 in row 0,
cells 9-17 in row 1 and so on. We can calculate the row number by taking
int(pos/9), where pos is the cell position.

Columns are numbered in the range (0,9) with cells 0,9,18,27... in column 0;
cells 1,10,19,28... in column 1 and so on. Column numbers
are calculated by taking (pos%9).

Submatrices are numbered 0-8. A cell with a row number in the range (0,3)
and a column number in the range (0,3) is in submatrix 0; column
numbers in the range (3,6) are in submatrix 1 and so on. These
submatrices are laid out as

Therefore, submatrix 0 consists of cells 0,1,2,9,10,11,18,19,20.
Submatrix 1 is comprised of cells 3,4,5,12,13,14,21,22,23.
To figure out the submatrix number, we don't need the exact
row and column number; we simply need int(myrow/3) and int(mycol/3):

mysub = int(myrow/3) * 3 + int(mycol/3)

To test this portion of the code, we feed in the above example
and use the Python debugger (pdb) to check it:

although cell 8 (upper right-hand corner) isn't yet
"known",
it could be 6 or 9.

cell 72 (lower left) can be 3 or 9.

submatrix 2 contains cell 8.

submatrix 6 contains cell 72.

Now, to do the "stepwise refinement" portion of step 1, we
initialize each cell to contain:

value=unknown
set_of_possibles={1,2,3,4,5,6,7,8,9}

We also read in each value from the input file. If the value is
unknown--either "-" or "."--leave the cell alone.
If known,
set the data structure's digit (the "if known" part) to that
known value. Then,
zap the set_of_possibles for that cell.
Finally, remove that value from the set of possible digits in the rest
of that cell's row, column and 3x3 sub-matrix.

Okay, so let's code it! We have an array (a "list" in Python) of
cells. For each cell, we need a notion of what rows, columns
and submatrices it belongs to.

The Python book talks about "classes", which would give me a chance
to try out this object-oriented programming thing.

All right, after a little puzzling, here is what I came up with.
Note: unlike the "Coconuts" program in
Part 1, which was short and easy, I ran
into a few errors with this one. Details are outlined in Appendix A.

Every instance of the class "Cell" represents one cell in the puzzle
and includes the following attributes:

pos: position in the puzzle. 0 for the upper left-hand corner,
80 in the lower right-hand corner.

value: zero if not yet known; otherwise, a number from 1 to 9
inclusive.

set_of_possibles: a Python list of values that--as far
as we know--the
cell could be. The set_of_possibles is zapped (set to an
empty list) once the value is known.

row: a list of cells that are in the same row that this
cell is in; it's
actually a reference to an element of Cell.rows.

col, sub: analogous to "row" but corresponding to the cell's
column and submatrix, respectively.

The calculations for which row, column and submatrix are carried
out in lines 16-18. This __init__ function assumes that it will be
called exactly once for each value of "pos" in the range(0,81).

For setting the value of a cell, the "setvalue" function is called.
Some simple checks are done--Is the value legal? Are you setting
the cell to a value we already know it can't be?--in lines 34-36,
and the value itself is set in line 38. Lines 41-48 find all cells
in the same row, column or submatrix and
remove "val" from the set_of_possibles.

Lines 51-75 describe the "doit" function. My original version
didn't have this as a function, but it's now in a function to make
debugging easier. It initializes the cells[] list and then decides
(lines 57-60) whether to get the puzzle from a file or from stdin.

Lines 66-74 interpret the input and call the "setvalue" function
for the appropriate cell.

Line 78 simply tells Python that when run, it should execute the "doit"
function.

The next problem is: does the code really work? It turns out that
pdb isn't exactly like, say, gdb, because
I had to modify the module in order to debug it.
Here's what I mean: you import the module, import pdb and then
execute the pdb.run function. But when you import the module,
it executes everything in the module.
That's why I used the function "doit", which does everything
at line 51. To debug s0.py, I also have to comment out line 78,
like this:

Having a debugger is better than not having one, but the debugger
requires several manual steps--it's labor intensive. I believe
it was Larry Wall, the inventor of Perl, who listed "laziness" as
an attribute of good programmers. Larry is especially right when it
comes to testing. It has to be easy to run tests, otherwise
people--meaning
me in this case and probably you too--won't do them
correctly. The results will be wrong, either
false positives or false negatives.
So it's better not to have to use the debugger to test our program.
Therefore, let's code Step 3, "glue" it onto Step 1 and test both
parts together.

Coding Step 3

Now we need to print out the answer. What we should do here is print the digit that's in each cell.
If we run it immediately after Step 1, some cells will be
unknown. Let's
print those out as "-", that is, in a form that this program
can read later.

That worked, so I changed the program to print the output
at the end and made the doit function run without intervention.
That way, I don't have to run the debugger every time. The end of the
program now looks like this:

Looking at the above, I realized that my old C-hacker self had
violated one of the principles of object-oriented
programming--information
hiding. The class Cell should completely
encapsulate the internals of the cell's data structure, and users
should use accessor functions (such as setvalue) to access the internals.
Lines 82 and 83 "peek" inside the data structure, so let me add a
"getvalue" accessor function and change the Step 3 code to use
the "known" function. Here are the changed parts, with "*" indicating
a line with new or changed code:

It took a half-dozen tries and changes before this code actually ran. Here's
what happened. When
I wanted to raise an exception, I wrote "throw" instead of
"raise". Although I had read the fine manual, I had forgotten
the magic word, and Python didn't like that:

After referring to the book, I realized that I had to use "Cell.rows"
here. All references to Class data needs to be qualified by
the class name, that is "Cell". So after fixing that, as well as
"Cell.columns" and "Cell.submatrices", things were looking better.

With the syntax errors under control, there were some other dumb
mistakes to fix, such as this one:

which_cell always tells me how many cells I have
processed so far. So after reading information for the last
cell, number 80 in the array, which_cell will be 81.
It was an off-by-one error, in other words. I changed ">=" to
">", after which I got:

% python s0.6.py 1109.puz
%

Hooray! No complaints. This final version is what appears in the
main part of this article.

Collin Park works for Network Appliance, where he uses Linux on his
desktop and laptop computers. He does data recovery and other
Linux-related stuff at home, where he lives with his wife and their two
teenage daughters. All use Linux to meet their computing needs.

Trending Topics

Webinar: 8 Signs You’re Beyond Cron

Scheduling Crontabs With an Enterprise Scheduler
11am CDT, April 29th

Join Linux Journal and Pat Cameron, Director of Automation Technology at HelpSystems, as they discuss the eight primary advantages of moving beyond cron job scheduling. In this webinar, you’ll learn about integrating cron with an enterprise scheduler.