PROGRAMMING ALGORITHMS, HEURISTICS

ITEM 176 (Gosper):

The "banana phenomenon" was encountered when processing a character
string by taking the last 3 letters typed out, searching for a random
occurrence of that sequence in the text, taking the letter following
that occurrence, typing it out, and iterating. This ensures that
every 4-letter string output occurs in the original. The program
typed BANANANANANANANANA.... We note an ambiguity in the phrase, "the
Nth occurrence of". In one sense, there are five 00's in 0000000000;
in another, there are nine. The editing program TECO finds five.
Thus it finds only the first ANA in BANANA, and is thus obligated to
type N next. By Murphy's Law, there is but one NAN, thus forcing A,
and thus a loop. An option to find overlapped instances would be
useful, although it would require backing up N-1 characters before
seeking the next N character string.

ITEM 177 (Gosper): DRAWING CURVES INCREMENTALLY

Certain plotters and displays are constrained to approximate curves by a
sequence of king-moves between points on a lattice.

Many curves and contours are definable by F(X,Y) = 0 with F changing sign on
opposite sides of the curve. The following algorithm will draw most such
curves more accurately than polygonal approximations and more easily than
techniques which search for a "next" X and Y just one move away.

We observe that a good choice of lattice points is just those for which F, when
evaluated on one of them, has opposite sign and smaller magnitude than on one
or more of its four immediate neighbors. This tends to choose the nearer
endpoint of each graph paper line segment which the curve crosses, if near the
curve F is monotone with distance from the curve.

First, divide the curve into arcs within which the curve's tangent lies within
one 45 degree semiquadrant. We can show that for reasonable F, only two
different increments (say north and northwest) are needed to visit the desired
points.

Thus, we will be changing one coordinate (incrementing Y) every step, and we
have only to check whether changing the other (decrementing X) will reduce the
magnitude of F. (If F increases with Y, F(X,Y+1) > -F(X-1,Y+1)
means decrement X.) F can often be manipulated so that the inequality
simplifies and so that F is easily computed incrementally from X and Y.

As an example, the following computes the first semiquadrant of the circle

This can be bummed by maintaining Z = 2Y+1 instead of Y. Symmetry may be used
to compute all eight semiquadrants at once, or the loop may be closed at C2 and
C3 with two PUSHJ's to provide the palindrome of decisions for the first
quadrant. There is an expression for the number of steps per quadrant, but it
has a three-way conditional dependent upon the midpoint geometry. Knowing this
value, however, we can replace C3 and C4 with a simple loop count and an
odd-even test for C4.

The loop must be top-tested (C3 before C1) if the "circle" R = 1, with four
diagonal segments, is possible.

All this suggests that displays might be designed with an increment mode which
accepts bit strings along with declarations of the form: "0 means north, 1
means northwest". 1100 (or 0011) will not occur with a curve of limited
curvature; thus, it could be used as an escape code, but this would be an
annoying restriction.

[In case of a tie, i.e., F has equal magnitudes with opposite signs on
adjacent points, do not choose both points but rather have some
arbitrary yet consistent preference for, say, the outer one. The
problem can't arise for C2 in the example because the inequality F
>= X is really F > -(F-2X+1) or F > X-.5.]

ITEM 178 (Schroeppel, Salamin):

Suppose Y satisfies a differential equation of the form

P(X) Y(Nth derivative) + ..... + Q(X) = R(X)

where P, ..... Q, and R are polynomials in X

2 2 2
(for example, Bessel's equation, X Y'' + X Y' + (X - N ) Y = 0)

and A is an algebraic number. Then Y(A) can be evaluated to N places
in time proportional to N(ln N)^3.

Further, e^X and ln X or any elementary function can be evaluated to N places
in N(ln N)^2 for X a real number. If F(X) can be evaluated in such time, so
can the inverse of F(X) (by Newton's method), and the first derivative of F(X).
Also, zeta(3) and gamma can be done in N(ln N)^3.

ITEM 179 (Gosper):

A program which searches a character string for a given substring can always be
written by iterating the sequence fetch-compare-transfer (ILDB-CAIE-JRST on the
PDP6/10) once for each character in the sought string. The destinations of the
transfers (address fields of the JRST's) must, however, be computed as
functions of the sought string. Let

In other words, a number > 0 in the top row is a location in the program
where the corresponding letter of the middle row is compared with a character
of the input string. If it differs, the number in the bottom row indicates the
location where comparison is to resume. If it matches, the next character of
the middle row is compared with the next character of the input string.

Let J be a number in the to row and K be the number below J, so that TK is the
address field of the Jth JRST. For each J = 1, 2, ... we compute K(J) as
follows: K(1) = 0. Let P be a counter, initially 0. For each succeeding J,
increment P. If the Pth letter = the Jth, K(J) = K(P). Otherwise, K(J) = P,
and P is reset to 0. (P(J) is the largest number such that the first P
characters match the last P character in the first J characters of the sought
string.)

To generalize this method to search for N strings at once, we produce a program
of ILDB-CAIE-JRST's for each of the sought strings, omitting the initial ILDB
from all but the first. We must compute the destination of the Jth JRST in the
Ith program, TKM(I,J), which is the location of the Kth compare in the Mth
program.

It might be reasonable to compile such an instruction sequence whenever a
search is initiated, since alternative schemes usually require saving or
backing up the character pointer.

ITEM 180 (Gosper):

A problem which may arise in machine processing of visual information is the
identification of corners on a noisy boundary of a polygon. Assume you have a
broken line. If it is a closed loop, find the vertex furthest from the
centroid (or any place). Open the loop by making this place both endpoints and
calling it a corner. We define the corner of a broken line segment to be the
point the sum of whose distances from the endpoints is maximal. This will
divide the segment in two, allowing us to proceed recursively, until our corner
isn't much cornerier than the other along the line.

The perpendicular distance which the vector C lies from the line connecting the
vectors A and B is just

(C - A) x (B - A)
----------------- ,
2 |A - B|

but maximizing this can lose on very pointy V's. The distance sum hack can
lose on very squashed Z's.