Most of the software is written in C, with additional parts
in 32-bit (Intel) assembly language. Using assembly for the
time critical parts rather than plain C accounts for a speed gain
of up to 500%.
For the class record algorithm a new program was developed in 2011. This is written in C# and
is optimized towards 64-bits. The algorithm described below is that of the newer version.
The differences with the older algorithm are indicated by the indication (32-bits:).

Since searches for convergence, glides and path records are several orders of magnitude faster
than searches for class records two different algorithms are used.

The convergence algorithm

The algorithm that checks for convergence, glides and path records uses a sieve: only those
numbers are checked that can not be shown to either converge or join the path of a lower number
within a few steps. The gain from this method is substantial. Using a sieve of 65536
(216) for instance leaves only 1720 out of every 65536 numbers to be checked.
Currently a sieve of 232 is used which means the fraction of numbers that can be
skipped is over 99.2%. Apart from that the number is checked to see whether it may occur
in the path of a lower number: all numbers that are congruent
2, 4, 5 or 8 mod 9 can easily be shown to occur in such a path
and can therefore be skipped as well. This accounts for 44.4% of all remaining numbers.
To speed up the search even further the code uses an "interlaced" technique, searching
212 numbers of the same congruence class modulo 232. This means the sieve
can be calculated rather than stored and also enables simultaneous calculation of the iteration
path up to the 32nd division for the entire congruence class. This results in a
further time reduction of approximately 50%.

The class record algorithm

In the algorithm that searches for high delays several techniques are combined to speed up
calculations. The main reduction is achieved by using a sieve to eliminate many numbers
beforehand. Different from the one above the sieve in this case checks whether
paths come together rather than for straightforward convergence. If two paths
join then obviously the upper one can never yield a class record and can
be skipped. Currently the sievesize used is 220 (32-bits:218)
elements wide, which in effect makes calculations of around 87% (32-bits:86%)
of all numbers unnecessary.
Example: any number of the form 8k+4 will
reach 6k+4 after 3 steps, and so will any number of the form 8k+5. Therefore no number of
the form 8k+5 can be a class record (5 itself being the only exception).
Numbers of the form 8k+5 therefore do not need to be checked,
and all positions of the form 8k+5 in the sieve contain a zero.

Even numbers and all numbers congruent 2 modulo 3 are
skipped as well, since potential class records of this type can easily be derived
from other, earlier class records, because any number of the form 2n + 1
can be seen to result in a number of the form 3n + 2 after two steps.
Similarly numbers of the form 9n + 4 can be seen to be skipped,
because any class record of the form 8n + 3 will result in such
a number after five steps (32-bits:no 9n+4 checks are performed).
Combining these last ideas with the sieve the calculations of around 93% (32-bits:91%) of all numbers
can be shown to be redundant, effectively speeding up the calculations by a factor 14 (32-bits:11).

A cut-off technique is also applied at several points to either ensure a number is still a candidate
for a high delay or else to terminate further processing of this number. In most cases this implies
the calculation can be cut short after only a fraction of its trajectory.
To see that this technique works assume that two consecutive Delay Records (N1 and N2) have delays
D1 and D2 respectively. Once a number drops below N2 its delay can not be higher than D1 + the current
iteration count. If this sum is below a certain threshold (e.g. the delay of the lowest class record
still missing) processing this number can be abandoned.
Which delay records are used obviously depends on the exact numbers and in fact different intervals
require different cut-off points to obtain optimal performance. During the current interval
(where starting numbers are approximately 257) trajectories are checked
against the delay records having delays of 1820, 1549, 1408, 1307 and 1087.
The performance gain of this technique depends largely on the magnitude
of the numbers concerned, but has been empirically determined to be around a factor 10.