After some thought I realized that I could find several
speedups. The first and biggest is what order the toggles
are searched in. When you choose elements on one side, you
can conclude diagonally. But I have to fill in the entire
board before drawing interesting conclusions. Therefore by
just reording what path you take you move the decision
closer to the conclusion and speed things up.

The other thing that I changed is that I separated the
decision about what paths to take from the toggling. As it
stands for most of the board the decision is obvious
from examining one board element what you have to do. But
I was toggling twice whether or not I needed it. But by
separating out that logic I make the logical structure
simpler, and I believe it is slightly faster.

UPDATE
Removed the ret_swap_square() function. Toggles go much faster if each swap is done directly rather than
indirectly through a function call. (Removing 5 extra
function calls per toggle matters...) Also dropped the
unused Carp that snuck in through habit. (This is
throw-away code...)