Solving the Ginnow's Sieve Puzzle

Carl Ginnow published an interesting solitaire game on the
rec.puzzles message board.
Its solution has several instructive points.
The rules are as follows:

Start with a list of the whole numbers from 1 to N, say N = 101.

Select a number from the list that has at least two factors in the list
(one of the factors may be the number itself).

Remove all factors of the selected number from the list.

Repeat steps 2, 3 until all of the numbers in the list have
only one factor remaining in the list.

Your score in this game is the sum of the numbers selected in step 2.
Which numbers should you select, and in which order, to maximize this sum?

I call Ginnow's game the ``Ginnow Sieve'' because of a
(very) vague resemblance to the Sieve of Eratosthenes.
(In Eratosthenes's sieve you cross out the multiples of selected
numbers; in Ginnow's sieve you cross out their divisors.
In the former the numbers are selected automatically and are the primes;
in the latter you select them as you wish, hoping for a good final score.)
Ginnow's sieve is a rather fun solitaire,
and I recommend you experiment with it a
bit before reading further.
This solitaire also seems like
it might be a nice recreation for youngsters to practice their multiplication
tables and learn about prime numbers.

For me the biggest fascination was finding a computer algorithm
to develop the optimal move sequence.

Such an algorithm will involve trial-and-error (the ``brute force'' idea
of trying essentially every possibility) or problem ``knowledge''
(heuristics restricting the form a solution might take) or both.
Ginnow's sieve will require both.
A simple example of problem knowledge is the fact that for each selected
number K in step 2, one of the factors still remaining in the list
and immediately then crossed out in step 3 will be of the form K/p
where p is prime.
This fact is easily confirmed, and can be left as an exercise to make
sure you understand the rules.

In other words, if K is the product of, say, 4 primes,
like K = 60 = 2*2*3*5,
then one of its remaining factors will be the product of 3 primes: call
that the principal factor.
K = 60 has many factors (1, 2, 3, 4, 5, 6, 10, 12, 15, 20, 30, 60) but
only three possible principal factors (12, 20, 30), and at least one of those
must be present in the list in order for 60 to qualify for selection.

Other examples of knowledge specific to this game are:

The best first move is always the largest prime p <= N.

The starting sequence (a, b, ... c, d, e) will yield the same
optimal score as
the starting sequence (a, b, ... c, e, d) as long as both are legal.

If a < b < c < d are all still on the list, and a | b, b | d, c | d;
then b is always a better selection than d.
(`` a | b '' means ``a is a factor of b.'')

After any starting sequence an upper bound can be found for the
final score.
That starting sequence can be rejected unless that bound exceeds the
best previous score.

These heuristics are used to reject portions of the game tree,
and thus reduce the size of the trial-and-error search.
Another heuristic, which increases the search but reduces execution
time, is to suppress some of these heuristics (because their application
is time-consuming) when the search is already deep in the tree.

When this problem was presented on Usenet, another expert programmer
and I had fun competing in pursuit of its solution,
using heuristics like those just given.
But progress was slow.
Because of the exponential nature of the search,
a program that takes only a few seconds when N=50, might take
several minutes when N=65, or many hours when N=80.
Then suddenly I found a totally different approach, that got much
faster speeds.
My new approach was to first solve a different game altogether!

I'll call this other game ``The Ginnow-like Game.''
It may not seem ``like'' Ginnow's original game at all, but the similarity
is undeniable: its solution led directly to the solution of Ginnow's
Sieve.
The rules of ``The Ginnow-like Game'' are as follows:
Make a list of ordered pairs (a,b) in which b is always a principal
factor of a. (``Principle factor'' was defined above.)
Each a or b must be drawn from the set {1,2,3,...,N}
and each number in this set must be represented at most
once among all the pairs (a,b).
The sum of the a's is your score.

Perhaps an example is in order to demonstrate the relationship
between these two games.
When N=18 the optimal score in Ginnow's Sieve is 111 which is
obtained with the move sequence (17, 9, 15, 10, 14, 18, 12, 16).
With one exception no deviation is permitted in the ordering of
these moves (the exception is 14, which can be delayed).
The optimal solution to ``The Ginnow-like Game'' consists of
the eight ordered pairs (17,1), (9,3), (15,5), (10,2), (14,7), (18,6),
(12,4), (16,8).
I have listed these in the same order as the solution for Ginnow's
Sieve to make the relationship unmistakable, but unlike the Sieve,
this game is defined so the order here is irrelevant.
It is the fact that we need not try different orderings that makes solution
of this second game very rapid.

The fact that move ordering is essential in Ginnow's Sieve,
and irrelevant in the other game, makes the two games very different
and it may seem quite peculiar that they have the same solution.
Let's look at another example, N=21. The optimal move sequence in
the Sieve is (19, 9, 21, 15, 14, 18, 12, 20, 16) for a score of 144.
Again no deviation is permitted in the ordering (except now 15 can be
delayed as long as it precedes 20), and again there is just one way to
map these nine moves to the nine ordered pairs required in the other game.

A simple analysis shows that 145 is an upper bound on the score
in either game when N=21.
Only one of the large primes (11, 13, 17, 19)
can possibly be used and it will take 1 as its ``principal factor.''
This leaves 16 numbers, or 8 number/factor pairs. Adding the largest
eight of these numbers, and 19, gives 145.
The solution above, 144, comes within one point of this, the deviation
being that 9 is used as a selection and 10 as a principal factor (for 20).

But wait!!
What about (19,1), (21,7), (14,2), (10,5), (15,3), (20,4), (12,6), (18,9),
(16,8)?
This is a valid solution to ``The Ginnow-like Game'' and achieves the bounding
score of 145.
These moves are shown in an order which almost works for the Ginnow Sieve,
but not quite: 21 must precede 14 since its factor 7 divides 14;
14 must similarly precede 10, and 10 must precede 15.
But 15 must precede 21 since we're using 3 as 15's principal factor and 3
divides 21.
This creates a cycle in the partial ordering: 21 < 14 < 10 < 15 < 21,
which means this solution to the Ginnow-like Game is worthless in the
Ginnow's Sieve.

What to make of this? I said I solved the one game by treating it as
the other, but the games are not the same after all!
The workaround is simply to modify the rules of the Ginnow-like Game to
outlaw a factoring cycle like the example just given.
If these factoring cycles were common we'd be back where we started,
needing to worry about move ordering, but it turns out they are comparatively
rare. (In fact N=21 is the smallest N where the Ginnow-like Game has an
optimal solution different from the optimal Sieve.)

I did adopt one technique which means the algorithm's answers cannot
be guaranteed.
Inspecting the results of the exhaustive searches performed earlier,
one sees that all the largish numbers occurred in the
optimal sequences, except when this was clearly impossible.
With this observation in mind, the software inputs a threshold,
called Mustt in the source code,
above which an attempt is made to force inclusion.

The above discussion should be adequate to understand the essential
algorithm and to see why I felt this problem was ``neat.''
Now I'll offer some comments on the details of coding.

In the software I maintain various lists, and the partial ordering
relation, as bit maps.
The associated macros are of general utility.

Certain operations on the bit masks are table driven; for example
one megabyte of memory is allocated just so that the sum of the
k largest entries in any 16-bit bitmask can be found quickly.
This sped up the program only a very tiny bit, but was included
anyway since the source code is intended to be a potpourri of
example techniques.
Not too long ago, a one-megabyte table would have been regarded as
preposterously large, but it would often be innocuous on today's systems.
Note that changing a single character near the front of the
source code will change the bitmask word size from 16 bits to 8 bits
(and the 1 megabyte table becomes just 2 kilobytes).

As with most depth-first searches, the search path is maintained in a stack.
It is often easy to build that stack automatically as part of the
procedure call stack; that disguises a little bit of complexity when
each function examines at most one prior node.
Here we'd then have to follow backpointers to print a solution, so it's
more convenient to use a static array.

Some moves are made by trial and error and some are forced by heuristic.
I chose to push the play stack only on the trial-and-error moves,
a dubious decision since it added considerable complexity just to save
a little time and memory.
I arbitrarily left room for 11 heuristic moves after each trial move;
sometimes more than 11 heuristic moves are made; a brief search on
the string NUMASS should acquaint you quickly with these details.
I could have made the program simpler by giving each move
its own stack entry, but it isn't so very bad as is, and performance
is often important as well as code simplicity.

Some of the so-called ``forced moves'' may be rejected by the
ordering rule, and this isn't detected until after we begin to
update the data structure.
Observe that these updates are partitioned so that any move rejection
occurs before any irreversible change is made.
This is a common idea, although here, again, there would have been no
need for care if we simply pushed the play stack on every move.

Another feature of this code, which may seem atypical to some
readers, is that the data structure contains a lot of redundancy:
The fields rfcount, revbef, remain,
and numremain, for example, could all be reconstructed
from other fields.
Such redundant data is a ``foible'' in many kinds of software,
and is a common source of bugs, e.g. in window handling code.
But in this kind of software, where speed is the key, you want the
data in a form for immediate application, even if that means
storing it redundantly in two or more forms.

``I found a new approach to this problem which improved search speed.
I map each selected number to the required factor *without regard
for move order.* This means redundant equivalent orderings aren't
processed. Another key idea is to restrict consideration of the
factors of K to the form K/p where p is prime.

``Since plays are no longer made in order, the software quickly finds,
for N=101, that 32 and 48 must be sacrificed to get 64 and 96. These
plays will be made near the very end, so an ordinary move-ordered
search might waste time trying to take 32 or 48 earlier.

``Ordinary move-ordered search software might be adapted to support
some of this behavior, but with difficulty. The main problem with my
approach is that an apparently valid move-factor mapping may have
a cycle in its required partial-ordering that invalidates it as a solution.
We keep track of the partial order and reject any selection which causes
such a cycle. Perhaps surprisingly, such cycles turn out to be fairly
uncommon: otherwise this new method would be worthless.

``Whereas speed required introducing complicated devices into the earlier
searches, the new method has no special heuristic or strategy except a
single threshold, M, passed on the command line; the software forces
numbers above M into the solution when it can. At present I cannot
guarantee my solutions when M < N, but experimentally I found that there
always seemed to be relatively small values of M for which the search
completed quickly and found the same solution provided by Hugo's more
exhaustive search program.

``This is 21 points better than the best solution I found using 49, which
uses the same numbers, except (25,45) is replaced with just (49).
This Third Exception is probably the final one, since for larger N
there is always p*p > N/2.

``This optimal N=120 solution was found whenever 55 <= M <= 64. (I
didn't complete the time-consuming runs for larger M.) When M=55,
the search took 53 seconds. Searching from scratch took 143 seconds
when M=63, but only 31 seconds when configured just to confirm
the known solution. For N=82, a challenge with earlier searching,
M=40 was adequate to find the optimal 2187 score, and took less than
a second. I'm sure there is room to improve these speeds.''