I like combinatorial problems, but I tend to solve them by brute
force. Around the year 2000, I enumerated the solutions of the
6×10 Pentomino puzzle in a couple of CPU weeks
on some 266 MHz Pentium II workstations.

My solution
in C is a brute-force depth-first traversal method. In 2013, I
made an obvious improvement
to prune those search trees where there exists a gap whose size is not
a multiple of 5. The pruning can pe achieved by flood-filling each
gap in a copy of the current board and determining how big each gap
was. On an Intel Core i5-2520M system at 2.5 GHz, an optimized build
of the C program will cover the whole search space in about 4 minutes.

A C++ version
of my solution makes use of the STL.
On the platform that I tried, the optimized C++ build was slower than
the C program due to the dynamic memory allocation overhead in the
queue management of the flood-fill routine. The C version would use
an array allocated from the stack, while the C++ version would use
std::list.

The performance difference went away in January 2014 when I noticed
that the work queue can more appropriately be represented as a set.
The original implementation of the flood-fill algorithm could dequeue
the same work item multiple times. In this case, the set can be
efficiently represented as a bitmap, more precisely in a 64-bit
integer, just like the 6×10 Pentomino board. The performance improved
by 60% when I exploited the fact that the already filled space can
easily be subtracted from the bitmap. In this way, we will never
dequeue an item that has already been processed. The performance
improved even further (40 seconds to return all 2,339 solutions) when
I eliminated symmetric configurations for the first piece. There were
56 configurations for the I-shaped piece before and 14 after the
optimization.

In December 2016, I refined the C++ implementation to make use of
the C++14 constexpr keyword, so that the bit patterns for
the pieces (up to 8 distinct orientations per piece) can be converted
at compilation time. I experimented with constructing all 2,014
distinct placements of the pieces at compilation time. This turned out
to be a bad idea not only in terms of Kolmogorov
complexity (the placements can be constructed in less than
2,014×64 bits of code) but also because clang hit a resource limit while
initializing the constexpr data.

When compiled with a development snapshot of clang 4.0.0
using -O3 -DNDEBUG, the C++ program finished 10% faster
than the C program. With GCC 6.2.1 and the same switches, the C
program was about as fast as the clang-compiled C++, and the
GCC-compiled C++ program was 25% slower. (Counterintuitively, the C++
would be slightly faster when using std::vector instead
of std::array.)

I also experimented with expanding the 12-deep recursion at
compilation time, using templates, but that resulted in slightly
slower execution time. It could be interesting to implement iteration
over an explicit recursion stack.

Solving 3-dimensional puzzles

In early 2014, my wife picked up a 4×4×4 puzzle at a flea market.
The cube is filled with one piece of volume 4 and twelve pieces of
volume 5, which yields a total volume of 4×4×4=4+12×5=64. Unlike in
the Pentomino puzzle, there are many more possible pieces of this
size, and there seem to be 20,433 solutions. With minor
modifications, the same solution algorithm is applicable:

For simplicity, the pieces are represented as character
strings.

Each string is converted to the representation of the puzzle
board (or cube).

Each piece will be rotated and shifted to generate all possible
positions on the puzzle board.

Each position will be represented as a puzzle board (64-bit
integer) containing only one piece.

In the 2-dimensional case, perform two nested loops,
flipping the piece or rotating it 90 degrees at each step.

In the 3-dimensional case, perform a loop for each dimension,
rotating the piece 90 degrees at each step.

For the first piece, only such shifted positions are allowed
that are not equivalent to rotating or flipping other positions.
In this way, we rule out symmetric solutions.

To shift a piece in the board, simply shift the bitmap after
ensuring that the piece is not touching the edge or face already.

Start the depth-first search with the 4-unit piece.

If some free space is not divisible by 5, reject the solution,
since it would not be possible to place all 5-unit pieces.

In December 2016, when I refactored the 3-dimensional puzzle solver
in C++14, I noticed that there had been a bug in my symmetry
reducer. When a piece is rotated along each axis, there are 4×4×4 or
64 potential configurations. For the small piece in my puzzle, 12 of
these configurations are distinct. But, many of these 12
configurations are actually symmetric with each other, if we place the
piece to a specific corner of the box and then rotate the entire box
in all 64 ways, without shifting the piece relative to the box. It
turns out that there are only 15 distinct configurations of the small
piece, instead of the 261 that my original buggy symmetry reduction
produced.