The neural networks described here are unsupervised and
adapt via self-organisation. They seek to minimise disturbance to their inputs.
Action-selection is viewed here as an adaptive process that allows an agent
to settle into a stable state. Examples of this include an agent adapting to
maintain homeostasis or an agent arbitrating between exploration and exploitation.

In a minimal disturbance system, every input into the
system drives the learning process. If there is no signal then the system is
seen as being in a stable state. Rewards and maximal return are not sought,
as is the case with credit assignment learning. Instead, any disturbance-free
state is satisfactory.

The neural networks are biologically plausible feed-forward networks made up of adaptive leaky
integrate-and-fire neurons. Each neural network is made up of three distinct layers;
input, middle and output layer. The networks are evolved and evaluated when tasked
with maximising two resources, labelled here as 'energy' and 'water'. The network
is provided with a set of actions that can either increase or decrease by one or
two resource points, or are neutral to,
either the energy or water level in a virtual body.

There is one output neuron per action. The action performed
by the network must have a direct and immediate
effect on the target environment. If this action has a desirable effect
then the corresponding input signals are reduced in the next turn.
In this way the network acts as a minimal disturbance system
as it settles upon actions that reduce its total input activation.

So for example, if the neural network was used in a robot with solar panels, actions
that moved it into strong light would increase the charge sent to its batteries.
An external module could sense this and reduce the appropriate input signal to
the neural network by an amount corresponding to how much it needed the batteries
charged. Once the batteries are charged, the input signals are no longer reduced and
the robot is pushed out of its stable state.

Neuromodulators

In BioPhysics of Computation, Koch
describes neuromodulators as being the brain's
closest equivalent to a global variable. In a computer program,
a global variable can be read or changed from any part of the code.
In the brain a neuromodulator can affect any neuron that
has receptors for it within a certain distance of the release site.
Receptors for a variety of neuromodulators and
neurotransmitters can be found on most neurons. In effect this allows each
neuron to be addressed with some degree of specificity using a combination
of neuromodulators.

Used here, a modulator is a global signal that can
influence the behaviour of a neuron if that neuron has receptors for it.
The signal decays over time, specified by a re-uptake rate, and
can be increased by firing neurons that have secretors for it.

Neurons that are to be modulated are given a random number
of receptors. These can be modulated by neurons in other
layers that have secretors for those modulators. The receptors modulate
either the neuron's sensitivity to input or probability of
firing. The effect of this modulation is determined by the level
of the associated modulator and whether the receptor is inhibitory
or excitatory. Neurons can also have secretors. These increase the level of
an associated modulator.

A minimal disturbance system can be biased towards either exploration
or exploitation depending upon which of its layers are modulated.
A network biased towards exploration, as in A), is more likely to try other
actions even though they have not proved to provide the most desirable effect
in the past. This means that if the effect of another action changes
for the better then the network is more likely to start using the other action.
A network biased towards exploitation, as in B), is more likely to
settle upon actions that have a desirable effect and result in reduced
input strength. This means that if the effect of another action changes
for the better then the network is less likely to start using the other action.

Current work involves using minimal disturbance systems to arbitrate between
other neural networks. Pictured below is a parent minimal disturbance system
feeding inputs into two child minimal disturbance systems. Each child system
has actions dedicated to increasing a different resource. The parent adapts
the strengths of its outputs depending on which resource has been signalled as
needing replenishment. This is signalled via an external source modulators.