In collaboration with
Stefanie Lindstädt, the
method was applied to a pattern ensemble consisting
of 80 characters from the font-courier DEC-dataset.
Each character was represented by an image made of
binary pixels.
Since there are only 80 characters but possible patterns
that can be represented by 150 pixels, the training set
contains an enormous amount of redundant information.

During training, the images were
randomly presented according to the
probabilities of English language.
The unsupervised system had
150 input units, 16 code units, and 1 ``bias'' unit.
Each predictor had
15 input units, 1 ``bias'' unit, and 1 output unit.
The learning rate of the predictors was 10 times as high
as the learning rate of the code units.
Within 10000 pattern presentations,
the system often learned to generate a
loss-free code of the ensemble such that the
code was much less redundant than the original data.
The redundancy (see the definition in section 1.2)
corresponding to the original DEC dataset
is .
The redundancy corresponding to a 16-bit code discovered
by the system is . See
[14], [13], and
[24]
for details.

This result corresponds to a dramatic reduction of redundant information,
although the achieved value is not optimal.
In many realistic cases, however,
approximations of nonredundant codes should be satisfactory.
It is intended to apply the method
to the problem of unsupervised segmentation of real world images.
See [30] for an application to
simple stereo vision.

One might speculate about whether
the brain uses a similar principle based on ``code
neurons'' trying to
escape the predictions of ``predictor neurons''.