Navigation

Who's new

About 490 million positions are at distance 20

Even though we now know the diameter of Rubik's Cube group in the half-turn
metric, there is still much yet to be discovered. The diameter in the QTM
and the STM are unproved (although they are almost certainly 26 and 18,
respectively). The exact count of positions at distance 16, 17, 18, 19,
and 20 in the half-turn metric is unknown. This note reports some progress
on an estimate for the count of 20's in the half-turn metric.

It is fascinating to me how problems of distinctly different difficulty
exist around the 3x3x3 cube in the half-turn metric. Initially, back in
the early days, we could solve individual positions non-optimally.
With appropriate ideas and some moderate computer power (about 1/1000
of a modern cell phone), we got to the point where we could solve
arbitrary positions near-optimally, with Kociemba's algorithm.
Eventually ideas and computer speeds got to the point where we could
solve arbitrary positions optimally with Korf's solver (initially in
about a day, now in much less than a second). With some further ideas
and effort we got to the point where we could solve huge chunks of the
space optimally at a high rate of speed, and even, with a heroic
amount of CPU, calculate the actual diameter of the group. But even
today, finding a distance-20 position is difficult---they are rare,
and proving a candidate position to be at distance 20 still requires
a few minutes of computer time.

I have made some earlier approximations of the count of distance-20
positions in this forum, but the uncertainty in those numbers has
always been very high. About a year ago I decided to work towards a
better estimate. To obtain this estimate, I solved 2500 randomly
chosen cosets of Kociemba's group (the group generated by
{U,D,F2,R2,B2,L2}), each with 19,508,428,800 positions. This took
about five months of CPU time, distributed across a number of
machines, and found optimal solutions for 48,771,072,000,000
positions at a rate of about four million optimal solves per CPU second.
This effort netted a total of 647 distance-20 positions (not all of
which were new), which is an average of 0.2588 distance-20 positions
per coset. Since there are 2,217,093,120 total cosets, simple
multiplication predicts a total of 574 million distance-20 positions.

Unfortunately, the variance of the sample set was very high. More
than 90% of the cosets had no distance-20 positions (2286 of the 2500).
Most of the remainder (142) had only one. And the cosets with the
greatest count of distance-20 positions had 159 and 59 such positions,
respectively. The standard deviation of the sample was 3.5, more than
thirteen times the mean; thus, the estimate derived from this sample was
highly uncertain.

Luckily, there is a way to reduce this uncertainty without too much
additional work. There is a fair amount of variability in the count
of length-19 canonical sequences that lead to each coset, and a
simple dynamic programming program can calculate, for each coset all
at once, how many length-19 canonical sequences lead to that coset.
Intuitively, the more length-19 canonical sequences that lead to a
coset, the fewer distance-20 positions that coset is likely to have,
and this is supported by all of the empirical evidence as well. Thus,
distance-20 positions are concentrated in those cosets with few
length-19 canonical positions, so we should sample those cosets more
heavily.

I created a list of all 138,639,780 symmetry-distinct cosets, sorted
by a decreasing count of distance-19 canonical sequences, and assigned
each an index from 1..138,639,780 based on that ordering. I broke
this set into four subsets and, for the three subsets with higher than
average incidence of distance-20s, I calculated additional sample points.

It turns out that the cosets with a higher incidence of distance-20s
are, in some sense, faster to calculate. We compute the distance-20
positions by performing a full search out to distance 19; the greater
the number of length-19 canonical sequences, the longer this takes.
Thus, those cosets that yield more distance-20 positions are usually
faster to run to completion in addition. (This observation breaks
down when you get further down the list, because for the majority of
the cosets it's sufficient to search through depth 18, do a "prepass"
to depth 19, and solve the few remaining positions with an optimal
solver, but for the portion of the space that is rich in 20s, the
observation holds true.)

The first sample set was for the cosets numbered from 1 to 5,000. I
solved every such coset in this sample exactly (I have been doing this
for some time). From these 5,000 cosets I obtained a total of 884,425
distance-20 positions (not all unique, but we will handle that below).
The mean is 176.9 and the standard deviation is 147.5. The high
standard deviation does not matter because we explored the entire
set, so there is no sampling error.

The next sample set was for the cosets numbered 5,001 to 15,000. I
solved 200 of these cosets, found 16,239 distance-20 positions,
with a mean of 81.645 and a standard deviation of 50.0. Extrapolating
these statistics, we expect a total of 816,450 distance-20
positions from this range.

The next sample set was for cosets numbered 15,001 to 1,500,000.
I solved 297 of these cosets, found 2,129 distance-20 positions,
with a mean of 7.168 and a standard deviation of 8.429. Extrapolating
these statistics, we expect a total of 10,645,000 distance-20
positions from this range.

The last range is for cosets numbered 1,500,001 to 138,639,780.
I solved 2479 of these cosets (these were entirely from the
original 2500 cosets solved that were indexed at greater than 1,500,000).
From this subset, I found 334 distance-20 positions, with a mean
of 0.1347 and a standard deviation of 0.7233. Extrapolating, we
expect this range to yield a total of 18,477,082 distance-20
positions. The ratio of standard deviation to mean is still
extremely high, but it is much better than before; in addition, the
range only accounts for about 60% of the total so the impact of
uncertainty is reduced.

The total expected distance-20 positions from all four ranges added
up is thus 30,822,957. Since we only explored symmetrically-unique
cosets, we need to multiply this by the 16 symmetries of the Kociemba
subgroup to obtain a final total of 490,000,000 distance-20 positions.
I believe the true count is likely to be within 10 or 15 percent of
this estimate.

It is striking that there can be so many distance-20 positions (more
than one for every man, woman, and child in the United States), yet
they are so rare (fewer than one in 80 billion positions overall,
equivalent to finding a single grain of sand in a dump truck full
of sand). This is why I am fascinated by them; they are rare, yet
plentiful, and very hard to find.

I am continuing my search for distance-20 positions. I occasionally
get another idea for speeding up the program, and I am still mining
the richest portion of the cosets, so success is frequent; I am
currently finding about 200,000 new ones a day using multiple
machines. As I get further down the list of cosets, solving each
coset gets harder, and yields fewer new distance-20 positions.

Including all the cosets listed above, and additional cosets solved
in prior experiments over the years, and the early results of Radu
and Kociemba in solving all symmetrical positions, I presently have
a total of 63,222,090 distance-20 positions, or about one in eight.
My computers continue churning through cosets extending this set.
In a few decades it may be child's play to list all distance-20
positions, but in the meantime, I will continue to chip away until
I have them all.

Comment viewing options

Length of the neighbors of the 20f* positions

Each position in the face turn metric has 18 neighbors. Have you looked at the length of neighbors of the 20f* positions?

The reason I ask is as follows. Because all positions in the face turn metric can be solved in 20 moves or less, the 20f* positions are all local maxima. In the face turn metric, a local maximum can be a weak local maximum or a strong local maximum. The distinction is that for a strong local maximum, all 18 neighbors are closer to Start than the position itself. For a weak local maximum, one or more of the neighbors is the same distance from Start as the position itself.

In my experience, most local maxima in the face turn metric are weak local maxima. If that is true of the 20f* positions (and it may not be), then there will be a certain clustering of the 20f* positions within the Cayley graph of the face turn metric. However, I can't even remotely picture if there is any relationship between clustering in the Cayley graph and clustering in a coset of the Kociemba group. Which is to say, I can't picture if making one move tends switch to a different coset or to remain in the same coset.

Connected 20s

Wow, I did a quick analysis on the 735K known distance-20 positions I have
(mod M+inv) and was surprised how "unconnected" they were.

735099 1
652 2
77 3
13 4
4 5
3 6
1 8

Almost all were completely isolated. The biggest connected component (using both moves and premoves) is only of size 8. I really thought there was going to be a big one with 100 positions or something.
What this tells me is almost all 20f*'s are strict local maxima. Even though we only have about 1 in 8 20f*'s in this set, the space we searched is highly connected because of the coset approach.

the cluster of 8?

I've become interested in modeling the outer surface of cube space despite how computationally hard that is and my not having a great amount of mathematical training. If we took cube space and reduced it to a solid 3D projection with the solved cube in the middle and the harder solves outside what do you think the surface might look like with these mostly separate peaks of 20f on top of a somewhat discontinuous surface of 19fs top of???

I've found a cluster of five 20f positions. Can I ask for a scramble in the cluster of 8? Thank you so much.

Tom, much thanks for checking

Tom, much thanks for checking on this. It's quite interesting how isolated the distance-20 positions are, and especially that for distance-20 positions the inverses of strong local maxima are also nearly always strong local maxima (i.e., your results from using both moves and pre-moves). These are very impressive results.

Howdy, Jerry!
This is an i

This is an interesting idea. I've pursued it earlier when I had fewer positions, but with 600K+ unique positions (mod M+inv) it's getting difficult. Even with my new super-fast cube solver, that's a lot of 20's to solve.

Since a coset is defined as aS*, where a is a representative position and S is the set of moves {U,F2,R2,D,B2,L2}, we know immediately that solving a coset also finds implicitly all neighbors connected by 10 of the 18 moves. So at least some data exists.

I'll do some analysis of the existing positions, in that I'll see how many clusters I have and how large each is based on the set I have; that should be relatively easy to do. It will be interesting to determine the largest connected component. I'll use both moves and "premoves" (that is, both left and right group multiplication by the 18 generators).

I wonder how you can compute

I wonder how you can compute fast the exact number of 19 move maneuvers, which "lead to a coset". Is this the same or something different than the number of 19 move maneuvers which "solve this coset", that is bring it into H? Maybe you can give some details here how you did this.

Calculating the count of canonical sequences that "reach" a cose

The cosets of Kociemba's group partition the group G into cosets.
Every canonical sequence takes the solved sequence into an element
of one of those cosets. We wanted to sort the cosets according to
how many canonical sequences took the identity into that coset,
because we believed the count of distance-20 positions in a coset
was strongly and inversely correlated with the canonical sequence
count.

To compute this value, we used dynamic programming to compute
the function

cs(d, c)

where d is the length of the canonical sequence, and c is the
specific coset. This function cannot be decomposed recursively,
but a related function can:

cs(d, c, f)

This counts canonical sequences ending in a turn of face f, that
end in coset c, and have length d. The recursive formulation is
simply

cs(d+1, c, f) = sum(m, p) cs(d, c . m, p)

where m is a move of face f, and p is a face such that the sequence
of face turns (p, f) is permitted. (For instance, a move of face U
followed by a move of face D is permitted, but not the reverse, and
not two consecutive moves of any given face.)

Then, of course,

cs(d, c) = sum(f) cs(d, c, f)

where f is any face of the cube.

To compute cs(d, c, f), an enormous table can can be used; the total
state space is 19 (lengths 1..19) times 6 (faces) times 2,217,093,120
for a total of 252,748,615,680 entries. This can be reduced by
computing the values in order of d and by reducing the space of
cosets by symmetry (since values between two cosets related by
symmetry will be the same); this cuts the entry requirements down to
2 times 6 times 138,639,780 which is 1,663,677,360. Using
single-precision floating point, this requires only 6.5GB of RAM.

(I originally performed this calculation using doubles on a 4GB
machine, by using Hoey syllables and axis instead of faces, and
by storing the table for each value of d into a disk file, and
rereading those disk files when calculating a new value of d, but
in these days of 16GB machines such heroics are no longer needed.)

Since I'm on the topic, I'll share some of the results. We know
there are 3,292,256,598,848,819,251,200 canonical sequences of
length 19; this means the average coset has about 1,484,942,860,158
canonical sequences that lead to it. The coset with the fewest
canonical sequences to depth 19 is

F3U2R3D3B3L1F1U1R1D3F3

It has only 558,579,323,856 length-19 canonical sequences that
lead to it (this is about 37.6% of the average.) Out of 19,508,428,800
distinct positions in the coset, these sequences (and the shorter
canonical sequences that lead to it) manage to miss 2,767
distance-20 positions. This is almost certainly the coset with
the greatest number of distance-20 positions; the first-place
coset is at index 738 on my list, with 4367 distance-20 positions;
that coset has 634,379,723,072 length-19 canonical sequences that
lead to it:

L3U2B1R3F3U3L3D3R2F1U3R3

The coset with the greatest count of length-19 canonical sequences,
with 14,979,482,126,237,464, or about 10,000 times the average,
was the subgroup itself.

Almost certainly the only cosets with 1000 or more distance-20
positions are these:

The first value is the index, the second is the symmetry of the
coset, the third is the count of distance-20 positions, and the
fourth value is a representative position. The coset at index
28,278 had 734,351,183,840 length-19 canonical sequences. Note
that symmetrical cosets can have a higher incidence of 20s (although
usually many of the 20s in symmetrical cosets are related to each
other by the symmetry of the coset).

Coset Symmetry

How is the correlation between the coset position in your list and the symmetry of the coset ? Have most of the first 5000 cosets some symmetry and most of the cosets with index > 1,500,000 no symmetry of the 16 cube symmetries which leave {U,D,F2,R2,B2,L2} unchanged?

Symmetry

Of the 5000 "top" cosets, less than 5% exhibit symmetry; 213 have 2-fold symmetry, 31 have 4-fold symmetry, and 3 have 8-fold symmetry; the remaining 4,753 exhibit no symmetry. So while symmetry has some impact, it is not dramatic.

Most distance-20 positions exhibit no symmetry or antisymmetry. The work you and Radu did found all symmetrical distance-20 positions, and you present that information at http://kociemba.org/symmetric2.htm; you find a total of 1,091,994 distinct symmetrical positions, 32,625 mod M+inv. All remaining distance-20 positions must be, of course, not symmetrical (but they are sometimes antisymmetrical or self-inverse).

The antisymmetry classes of the distance-20 positions that exhibit no symmetry are as follows. Mod m, I have found 622,907 positions lacking antisymmetry, 47,282 positions that are antisymmetric but not self-inverse, and 1,281 that are self-inverse. Thus, the total number of known distance-20 positions is

622907 * 96 + 47282 * 48 + 1281 * 48 + 1091994 = 63,222,090

Very few elements of the Kociemba subgroup exhibit any symmetry (as a fraction of the total positions); but yes, they do have a higher incidence in the "20s-richer" portions of my sorted list, I do not believe this has a first-order effect on my estimate, however.