Choreonumeric Entropic Data Reduction

In May 1972, Schalkwijk, Antonio, Petry published an article
on variable-to-fixed-length source encoding data-reduction,-
by my design. [Proceedings of the Fifth Hawaii International
Conference on System Sciences, pg. 498] Ca 1975, I proposed
an optimization and generalization to my [*UCSD] algorithm

REVIEW:

The advance of signal-information entropy (-log pk) in a sequence of signals
p, probability p, taken of a repertoire, can be plotted as a tree of branches of
various lengths to nodes of more or less comparable progress -coinciding most
commonly if the p have coincident powers- in an array of multiplicative terms
p1i1·p2i2·p3...
(or logarithm thereof, a more information-obvious representation of the cumulating
entropy) ... if all-but-one p are small the practical tree is a one-branch
run-length code; In our run-skip-length codes it is trimmed triangular.

In our coding trellis (a tree reconnected at coincident progress) [ibid] this was
represented as counts of paths-out at each node:

Eg. for the case of p,q where q1=p2,
drawing several views of the same triangle:

The triangular form obviously extended to multidimensional alphabets of pn,
planar-faceted, hypercube corners, and to sources with hysteresis, pre and post memory,
cusped long-tail triangles where p-probabilities tend to favorably remember their usage
statistics, prompt-adaptive, which get at the run-lengths when those occur (*)
... but specializations though obvious were not articled there.

* (A prompt-adaptive algorithm needs the probabilities of the probabilities.)

THE NEW VIEW:

Linearization of the multidimension array, and generalization of algorithmic progress:

Obviously in the regular-edge triangle case the shorter row-runs of probability terms
or our path-counts can be compactly stored as end-runs of the longer rows
(eg. 3-2-1 in 8-5-3-2-1): a fold-up.
And similarly at higher dimensions, spreads stored as end-fans ... etc.

But by interpretation this is a linear hopscotch of paths representing run-skip-length
progress on an information-entropy -log- plot, with distinct nodes regrouped
with near nodes as integral steps (a slight quantization loss). This interpretation
generalizes, no longer requiring strictly regular skips nor particular shape
but overall entropy progress: At each step is a vector of the path-out count and
skip-length (plural for higher dimensions; shortest first). And the technique is
applicable to anisotonic memory-chained stochastics.

PROCESS SUMMARY, EXAMPLE:

A coding pass is a linear choreograph hopscotch of steps-taken, as indicated by the
total coding sum; a uniquely, lossless, compressed message. Each step is a quantum
approximation-estimation, with direction 'arrows' to its immediate subsequents;
(frequency-weighted quantization error being its loss of data-compression efficiency).

Choreonumeration simply counts any course across the 'floor'.
The algorithm is essentially fast linear run-step/skip ... for any signal
repertoire, spelling alphabet, or sampling domain. Furthermore, steps through
entropy space need not be regular,- though that was our introduction: This algorithm
is not, constrained to memoryless codes.

Where needed, path-lengths can be resolved to fractional steps of entropy on parallel
paths until remerged at whole common steps, as essentially integerized logarithmically
implemented arithmetic coding. Even the final whole step can be fractionally trimmed
by having the next coding begin an alternative hopscotch (in both sender and receiver)
assigned and tuned to effect probability concatenation.

(The resulting data structures are fast additive rather than multiplicative,
and can be simply implemented by simple machines.)

LONGER ARRAYS, EFFICIENTLY:

As choreonumeration is an approximation to arithmetic coding, its subcounts need
not be compacted integer but "hull"-measure, each bounding its sum, to keep path
enumerations unique and stepping efficient albeit not exactly all utilized. This
allows floating-integer format, where the scale is about the number of digits to
process and the fraction approximates the stepped coding precision keeping the fill
tight. Scale can be assumed along the way with cumulative tweaking; and long run-arrays
can thus be utilized very compactly. For the process-summary example above, the
array extended leftward,--

It is conceivable also, from this example, that an efficient run-skip-hop array can
contain the entire cycle (1000-long here), and be reused cyclically (thereby more
effectually) ... Thus again indicating this process is simply arithmetic coding done
by array-lookup. And at this point it may be worth noting that the compact cyclic
array can be built from top-down rather than bottom up; or both; And in this case,
the array extended rightward might be:-- [under construction momentarily]

FINAL COMMENTS; More to do; More mathematics; More under construction:

1. This algorithm approaches fixed-length-to-variable coding, Huffman or a variant, as
triangles become shorter and the various tails concatentate onto alternative short
triangles (the -group- not the triangles themselves being closed under the
concatenation process).

2. One more insight I considered at the time believing we were probably implementing
arithmetic coding despite a given necessity to examine an algorithm [*1972], was to
track the interval with cumulations of log2 p,q to many places: bounding
and sending bits in the clear not straddling. A way to prevent straddling appeared
to involve flipflop the code domain so that the smaller q always decoded, and the
larger p was always delayed, until the end-flush. A priori the algorithm looked like
it may have needed to stack intermediate cumulations. [under construction]

* [Our original undergraduate work came about because arithmetic coding by
partitioning the unit probability interval was computationally expensive in floating
point multiplications and Dr. Schalkwijk's designed savings using combinatoric integer
factoring of n!/k!n-k! effecting a rectangular section of Pascal's Triangle
approximation to a probability-coefficients triangle was the subject of our group
study;- but noticing that his procedure was equivalent to index-cumulation in a trellis,
I cut his rectangle into a triangle and applied Pascal's technique from '1's on the
jagged hypotenuse:- a small efficient memory eliminating multiplications altogether
(selective-cumulation itself constitutes a model-adapted multiplication). I left
multidimensional adaptation as obvious; and later simplified the process by folding up
a regular-edge triangle, ca 1975;- and thence simplified by choreonumerating altogether.
Recently I added mention of floating-integer formatting]