It seems that people tend to memorize and focus on
atypical or
unexpected events and
that they often try to explain new atypical events
in terms of previous atypical events.
In the light of
the principle of history compression
this makes a
lot of sense.

Once events
become expected, they tend to become `subconscious'.
There is an obvious
analogy to the chunking algorithm:
The chunker's attention is removed from
events that become expected; they become `subconscious' (automatized) and
give rise to even higher-level `abstractions' of the
chunker's `consciousness'.

The chunking systems described in [Schmidhuber, 1991a],
[Schmidhuber, 1991c] and the current paper try to
detect temporal regularities and learn to use them for
identifying relevant points in time.
A general criticism of more conventional algorithms
can be formulated as follows: These algorithms do
not try to selectively focus
on relevant inputs, they waste efficiency and resources
by focussing
on every input.

Speech is a good example of a domain involving multi-level temporal
structure. Ongoing research will explore the application of
chunking systems to speech recognition.

The principle of history compression is not limited to
neural networks. Any adaptive sequence processing device could
make use of it.