With the modified automaton,
it turns out that conventional
recurrent networks
fail to learn the correct classifications
within training sequences
(various topologies were tested).
Two related reasons are:

Time lags are too long (error signals become less
significant while moving ``back into time''). See, e.g., [7].

The presumed search space is huge (in principle, the recurrent net
considers all possible symbol combinations as equal candidates
for being the reason for the final classification).

But the ``real'' search space ought to be small, because most
possible symbol combinations can never occur.
The modification of the automaton did not cause a
change in entropy.
How can an adaptive system find this out?
The next section gives an answer.