But we now offer the luxury of recomputing which piece shall better have half orientations removed (thePieceWhichBreakSymmetry), instead of hardcoding 6.
And we compute the case of northern islands (except on the last two rows, we'll see later why).

For detecting north islands, we change our existing island detection to stop a bit earlier:

Saturday, April 14, 2012

Until there, we played with low level bit tricks, and have used the official ASCII board printing, but these bits are frustrating, we are in Smalltalk, a graphical environment. So we are going to illustrate the algorithm with a poor man Morph in Squeak. I say poor man because the morph will be dedicated to visualisation, not interaction, but that will already serve.

First, we craft a morph just for displaying an hexagonal cell. We hardcode the edge length as an integer between 10 and 20 pixels, that fall near a whole pixel when multiplied by 60 degreeSin and 60 degreeCos: that is an even integer, and we catch it with a centered modulo in interval [-1/2,1/2[

We can now instrument our solver to display its own progress. Of course, since last number of loops was more than 800,000 that means that visualizing the whole algorithm would require more than 80,000 seconds even if we display 10 pieces additions per second. But we will stop the animation before the end.

This animation clearly confirms that we do not eliminate every island. For example after this case:

3 more trials are necessary before abandoning the solution:

And the second confirmation is that our idea to turn the board above 6th row was very clever for minimizing the possible positions set-up , but not at all optimal for abandoning bad solutions early, because some islands are rejected in upper rows once we rotate the board, as illustrated on this snapshot:

So there is room for further improvements, and in next post I'll be back to a more dumb solution. Also, I created this little movie with the first minutes of solving by simply generating the PNG with instrumented code (and an iFrame instance variable initialized to 0),

Thursday, April 12, 2012

The shootout benchmark is written for Visualworks, so I tried to port the meteor contest in VW7.8 non commercial.

Here are the required modifications:

<< and >> are not understood by Integer and must be replaced by proper signed bitShift:

bitsDo:bitReverse: and bitCount operations are missing and must be added as extensions

SequenceableCollection>>reversed is not understood, the VW version is reverse.

Surprise, VW did not perform better than Cog on my mac mini. 1.5 seconds instead of 1.3s for solving the board.

Yes, but... Unlike Squeak, VW positive SmallInteger don't have 30 bits but only 29. The Squeak solution was thus a bit sub-optimal in this context (I like to read a bit in double sense).
Oh, let's just change that! The major generator of 30-bits Integer is the bitReverse: operation when we turn the board 180° once first 6 rows were filled. After the reversal, the boardMask has two rows of barrier toward north, remember:

And this one tells us we waste a bunch of time in SequenceableCollection>>do:
Ah yes, Squeak MessageTally told that too, but I didn't trust it enough.
Deceivingly, we will have to inline this do: loop by ourselves. So we also change the end of above method:
...snip...

For 885075 loops, that's less than 1 microsecond per loop, not that bad after all.

For single-core oriented programs, my machine has more or less same performance than shootout reference machine, as can be verified with the reference g++ solution (which runs in 80ms as reported on the shootout site)./usr/bin/g++ --versioni686-apple-darwin10-g++-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3)Copyright (C) 2007 Free Software Foundation, Inc.This is free software; see the source for copying conditions. There is NOwarranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

I'm reluctant to play with all the g++ code generation options, but that makes roughly a factor 10 between C++ and Smalltalk for that one (not accounting for image startup time).
Sure, I played with very low level bit tricks, but hey, that was the spirit of the game.

Tuesday, April 10, 2012

We have sped up individual loops enough and can now go back to a bit more reduction of combinations.

Next idea is to detect more islands cases at pre-process time. We already removed pieces having two cell groups on the same edge, but what if they have a single cell group?

For example:

...

* * * * 1

* * * 1 *

* * 1 1*

* * 1 X X

The green Xs indicate filling of the board by previous pieces, and blue 1s is the new piece we want to insert.

It obviously creates an island indicated here by red stars *.

In above figure, there are at least 3 stars that should hold 1s, or maybe 8, because the number of 1s is always a multiple of 5 - the piece size. So the read stars could well be filled by previous pieces. But if it contains one or more 0 holes, then the blue piece cannot fit this place.

This can be detected early. We first have to modify our filling algorithm because it must stop at north most row of the piece, which indicates an open sea. For this, we set the fillMask as an instance variable.

Saturday, April 7, 2012

In previous step, we were at about 1,000,000 loops for only 10,000 productive.

And we have about a factor 100 to gain compared to C++.

I'm not sure we can be that clever and avoid any false solution.

Yes, sure, we cannot reach speed of C++ yet with our VM, even the Cog one.

But we can do something to reduce a single loop cost.

One thing to notice is that the 1st part generating all pieces positions do a lot of useless work. Until pieces reach the north edge, we repeat the same bit pattern every two rows. That's a clue indicating something is wrong.

Moreover, in 32 bit Squeak image, we only have 30 bits left for representing a positive SmallInteger.

Since a piece can span over 5 rows, and we consider filling on row 1 and 2, the northern two rows are useless and the board requires only 6 rows, that is 30 bits, a positive SmallInteger...

Good. For the first row, we can use a specially crafted southPossiblePositions.

But how to handle north rows above row 6 and prevent them to spread out of north edge?

Well, we don't have to care of north. We can use the 180° symmetry and as soon as 6 rows is full, reverse the whole board and start at south again.

Of course, we still have to shift the solution for filling the board. To avoid creating a LargeInteger, we'll store the rowOffset shift in a new ShootoutMeteorPiece object, and use it in possible positions, and in solutions.

One subtle thing is that we already handled the symmetry by removing half orientations of piece #6. If we rotate the board, we have to consider the rotated orientations of piece #6. Hmm, this will be a bit technical to keep this optimization but we'll do.

That is a bit more complex right now, that's the tribute to optimizations. Especially the rowStatus management cries for a refactoring. But right now, we just focus on CPU efficiency, not source sustainability.

Note thatbitReverse:may be absent from a Pharo image. Just pick it in Squeak.

That's a 20% more loops than Part3, but that's possible, we didn't use the same order for filling the board, especially after the rotation we started back at southEast in a rather less constrained area, and we delayed detection of bad boards.

But now, the cost is about 1.7 seconds instead of 7.

Great! This would place Smalltalk at a not so ridiculous rank among those shootout languages.

And we still have room to reduce the number of combinations.

The code is at http://ss3.gemstone.com/ss/Shootout/Shootout.blog-nice.5.mcz