Cycle detection and the stack algorithm

Suppose we are given a function f that maps some domain D into itself. Given an initial element x0 from D, define the infinite sequence x1=f(x0), x2=f(x1), etc.

If the domain D is finite, then eventually some element in the
sequence must repeat itself, and from then on the sequence will repeat
itself over and over. The sequence will "fall" into a cycle.

The cycle detection problem asks for detecting that the sequence has become periodic, and finding the length of the cycle.

Cycle detection arises in a number of situations:

Knuth discusses it in the context of analyzing random number generators. In the 3rd edition of The Art of Computer Programming, vol. 2, exercises 3.1-6,7, he presents and compares several of the known algorithms.

We might want to detect when, say, a cellular automaton configuration has become periodic, and find its period.

Pollard's famous rho methods for factorization and discrete
logarithms are based on cycle detection. (The algorithm presented here,
however, cannot be applied to the rho factorization method.)

I discovered the algorithm presented here on October/November 2002. The algorithm requires that a total ordering be defined on D. For simplicity, we assume D is a finite set of integers.

The basic stack algorithm

Our basic algorithm is as follows: Keep a stack of pairs (xi, i), where, at all times, both the i's and the xi's in the stack form strictly increasing sequences. The stack is initially empty. At each step j, pop from the stack all entries (xi, i) where xi>xj. If an xi=xj is found in the stack, we are done; then the cycle length is equal to j-i. Otherwise, push (xj, j) on top of the stack and continue.

This algorithm always halts on the smallest value of the sequence's cycle xmin, on the second occurrence of this value: Once xmin is added to the stack the first time it appears, it is never removed. Therefore, the algorithm will halt when it encounters xmin for the second time. On the other hand, any other cycle value is greater than xmin, so it will be removed by xmin before it has a chance to appear again.

Let n be the halting time of the algorithm. It can be shown that the size of the stack is bounded by O(log n) with high probability (i.e., the probability tends to 1 as n tends to infinity). Therefore, the algorithm only requires a logarithmic amount of memory.

The above assumes that the magnitudes of the sequence values are random
and independent. If these magnitudes are known in advance to have some
regularities, we could "randomize" them by using a hash function:
whenever the stack algorithm asks for comparing two values, we compare
their respective hashes instead.

The partitioning technique

The basic stack algorithm halts at a uniformly random point in the second loop through the sequence's cycle.

In order to increase its probability of halting closer to the beginning of the second loop, while adding virtually no extra time penalty per step, we introduce the following partitioning technique.

For some integer k, we divide the domain D into k disjoint classes. We could, for example, consider the remainder modulo k of each value xi. We keep a separate stack for each class. At each step i, we first determine which class xi belongs to, and then we insert (xi, i) in the corresponding stack. We call this the multi-stack algorithm.

Why does the algorithm still work? Because, whenever a given value xi
appears, it is always sent to the same stack. Furthermore, since at
each step, the algorithm operates on a single stack, the run time per
step remains practically unchanged. Moreover, the algorithm halts
whenever the first of all the stacks detects a repetition.

Each class j, 0<=j<k, contains its own cycle minimum xmin, j. Each xmin, j
is distributed uniformly and independently at random in the cycle. It
follows that the distance from the beginning of the cycle to the first
minimum is, on average, a fraction 1/(k+1) of the cycle length.

Any algorithm must obviously run at least until the beginning of the second loop through the cycle. So if we take, for example, k=100, then our running time will be just 1% of the cycle length above the absolute minimum achievable.

What happens to the memory requirements? When using k stacks, the size of each stack decreases by O(log k), a small amount. Therefore, since we have k stacks, the total memory used is roughly multiplied by k.

When is the multi-stack algorithm better than the other known algorithms?

Most of the above algorithms (numbers 1, 3, 4, and 5) are designed for the case where f is a random function on D, i.e., f is chosen uniformly at random from among the |D||D| functions that map D into itself. When f is a random function, the aperiodic part of the sequence is, on average, equal in length to the sequence's cycle.

However, there are situations where f does not behave like
a random function. For example, in cellular automata (such as the Game
of Life) it is very common for a configuration to run for a long time,
and then fall into a cycle of length 1 or 2. Thus, the sequence's cycle
is typically very short compared to the aperiodic part of the sequence.

Since the stack algorithm is guaranteed to stop within the second loop
through the cycle, it handles short cycles very well. Other algorithms
do not have this guarantee, so they do not perform so well on short
cycles.

Furthermore, for large cycles, the multi-stack algorithm offers a time/memory tradeoff. By increasing the value of k,
we can improve the algorithm's performance at the expense of using more
memory. Not all of the other algorithms mentioned above offer
time/memory tradeoffs (numbers 4 and 5 do).

Finally, some of the above algorithms (numbers 4, 5, and perhaps also 2)
require somewhat complex data structures, such as hash tables, to run
efficiently. In contrast, the multi-stack algorithm is very simple to
implement.

Thus, the multi-stack algorithm is the only one that handles well the
case of short cycles, and also offers a time/memory tradeoff.
Furthermore, it is very simple and elegant.

(We note, however, that for the case when f is a random function,
the distinguished point method offers a somewhat better time/memory
tradeoff than the multi-stack algorithm.)