All real computers are dynamical systems that carry out computation through their change of state with time. A simple digital computer can be thought of as having a large number of binary storage registers. The state of the computer at a particular time is a long binary word. A computation is begun by setting the computer in an initial state determined by standard initialization + program + data. At each tick of the computer clock the state changes into another state, following a rule that is built in by the design of the machine. After a while the computer stops changing its state, and the answer to the computation is read out from a subset of the storage registers. A fragment of a two dimensional representation of the motion in state space is shown in Figure 1.

Figure 1: Two-dimensional representation of motion in state space. Transitions to states outside this fragment are not indicated

The digital states are represented by small blue dots, the state transitions by the arrow lines, and final states (which have no transitions leading out of them) by large red dots.

Neural network computation or analog computation can similarly be described as state space flows but without the discretization of state variables and time. For general neural networks, the only way to understand what future state will evolve after a long time is by following the dynamics in detail.

Hopfield networks have a Lyapunov function \(E\) (or ‘energy function’) behind their dynamics which leads to an understanding of possible final states. The Lyapunov function decreases in a monotone fashion under the dynamics, and is bounded below. Computation in this system can now be thought of as being done by starting from an initial state from which a final state evolves by moving downhill on \(E\ .\) Because of the existence of an elementary Lyapunov function for the dynamics, the only possible asymptotic result is a state on an attractor. Hopfield networks have been shown to be capable of universal computation in the Turing sense.

Binary neurons

The original Hopfield net [1982] used model neurons with two values of activity, that can be taken as 0 and 1. The strength of the synaptic connection from neuron \(k\) to neuron \(j\) is described by \(T_{jk}\ .\) The state vector \(V\) of the network at a particular time \(t\) has components \(V_j\) describing the activity of neuron \(j\) at time \(t\ .\) The dynamics of the system are defined as follows:

Activity of neuron \(j\) is \(V_j\)

Strength of synaptic connection from neuron \(k\) to neuron \(j\) is \(T_{jk}\)

If there are no self-connections (i.e. \(T_{kk} = 0\) for all \(k\)) and the connections are symmetric (\(T_{jk}=T_{kj}\) for all \(j, k\)) then this system has the Lyapunov function
\[E = - \frac{1}{2}\Sigma_{jk} T_{jk}V_jV_k - \Sigma_j I_jV_j\ .\]

This \(E\) is also a Lyapunov function if the neurons are repetitively updated one at a time in any order.

Positive values of \(T\) represent excitatory connections, and negative values inhibitory connections. Minor transformations of variables relate this system to a magnetic Ising system, with \(T_{jk}\) equivalent to the exchange \(J_{jk}\) between Ising spins \(j\) and \(k\ .\) A related set of automata networks, which use synchronous update of all neurons at once, has been extensively studied.

The dynamical update described above can be thought of as a Monte Carlo procedure or a Glauber dynamics
for the change of state of a thermodynamic system at zero temperature, where the energy of the system is described by \(E\ .\) The Boltzmann Machine uses a Hopfield net with stochastic update to simulate a non-zero temperature.

Graded neuron response and continuous variables

Let the stochastic binary neurons be replaced by neurons whose instantaneous activity increases with input, so that \(V_j = g(x_j)\) where \(g\) is any monotone increasing function, bounded below and bounded above. In this representation, details of action potential timing are suppressed, and action potentials are thought of as being generated in a stochastic fashion with an instantaneous rate \(g(x)\ .\) The customary bounds on \(g(x)\) are thus often taken as zero and the maximum firing rate. Let synaptic currents lag behind the firing rate with a simple exponential kernel \(\exp( -t/\tau)\ ,\) corresponding in real neurons to a synaptic current that rises rapidly then decays exponentially. Within this description, the evolution of the state of the network is given by the ordinary differential equations
\[dx_j/dt = - x_j/\tau + \Sigma_{jk} T_{jk}V_k + I_j\]

When \(T\) is symmetric, this dynamics [1984] has the Lyapunov function
\[ E = - \frac{1}{2} \Sigma_{jk} T_{jk}V_jV_k - \Sigma_j I_jV_j + \frac{1}{\tau} \Sigma_j \int^{V_j} g^{-1}(Z)dZ\]
where \(g^{-1}\) is the inverse of the gain function \(g\ .\) There is a significant limiting case of this function when \(T\) has no diagonal elements and the input-output relation becomes a step, going from 0 to a maximum firing rate (for convenience, scaled to 1). The third term in this Lyapunov function is then zero or infinite, and this \(E\) is then identical to the function for binary stochastic neurons except that now the third term restricts the domain to \(0 \leq V_j \leq 1\) on which the third term is zero, instead of the earlier \(V_j = 0\) or \(1\ .\) With no diagonal elements in \(T\ ,\) the minima of \(E\) are all located at corners of the hypercube \(0 \leq V_j \leq 1\ .\) In this limit, the stable states of the continuous variable system are exactly the same as the stable states of the binary system.

Illustration: 2 neuron flip-flop

A 2-neuron case with reciprocal inhibition illustrates the behavior. The input-output relation used for the neurons is shown in the first panel of Figure 2. Excitation was provided by setting \(I_1 = I_2 > 0\ .\)

Figure 2: Phase portrait of 2-neuron Hopfield Network.

The second panel shows the trajectories of the system in the \((V_1, V_2)\) phase plane from a variety of starting states. Each trajectory starts at the end of a black line, and the activity moves along that line to ultimately terminate in one of the two point attractors located at the two red symbols "*". The third panel presents contour lines and a color map of \(E\ .\) The low \(E\) contours are drawn. The color map shows the higher values of \(E\ ,\) progressively larger from black to blue to yellow to red. The red dots of the central panel are located at the bottoms of the two minima seen in the third panel.

The phenomenon of associative memory matches the idea of dynamics controlled by a Lyapunov function. Consider a set of states \(M^p\) that are desired as memories. The location of each memory in state space describes the attributes of the memory. The idea of associative memory is that when a memory clue is presented, the actual memory that is most like the clue will be recapitulated. Suppose synaptic connections are constructed so that each memory vector \(M^p\) is a local minimum of \(E\ .\) Starting with partial information about some memory \(s\) means starting relatively nearer to \(M^s\) than to other memories. This starting state is then likely to be within the ‘valley’ of the terrain \(E\) that has \(M^s\) as its lowest point. If so, the dynamics must result in the final state \(M^s\ ,\) the correct memory reconstruction. Such a reconstruction is illustrated

Figure 3: Memory recall from a small clue.

for a memory having 50 properties (e.g. name) and within each property 20 values (e.g., John, Mary). This memory contains 250 ‘known friends’. A particular ‘known friend’ can be described by a set of 1’s locating the values for each property, and the information about one particular friend is shown in the grid at the left. Memory retrieval begins from a clue, an initial state describing partial information, as typified in the second grid where values are indicated for 7 categories. The next two panels show intermediate states during the retrieval process, and the final stable state is shown in the last panel. Careful inspection of the figure will verify that the final state is the memory from which the clue was taken, except that the clue itself was slightly erroneous (an incorrect value for property 2) and the memory dynamics properly corrects that error.

When the number of memories to be contained in \(T\) is not too large, and the memories not correlated, a new memory \(M^{\rm new}\) can be incorporated in the system through a synapse change rule

This is a Hebbian rule in that the synapse change is proportional to the product of the activity of the pre- and post-synaptic neurons. This rule is appropriate to the case of neurons with binary activity represented as \(V=\pm1\ ,\) and memories chosen at random, and requires modifications for more general circumstances.

An excitatory-inhibitory generalized Hopfield network can implement an associative memory that is rather more biological in its details. It no longer requires memories to be uncorrelated; synaptic connections obey Dale’s Law; neuron activity is limited by inhibitory feedback rather than by the maximum firing rate of a neuron; and memories are learned through changes in excitatory synapses only. In the 250 memory example above, a learning rule of the Hebbian type yielded completely accurate recall of 232 of the memories starting from clues containing 20% of the information in a memory. The number of memories that can be stored without disaster scales with the number of neurons. Neuronal noise will decrease the accuracy of recall, but the effect of the noise decreases with the size of the system, since the height of the 'barriers' in E separating different memories scales with the size. For large systems, the operation thus becomes collective, and the overall behavior is fail-soft to damage.

Lyapunov functions can be constructed for a variety of other networks that are related to the above networks by mathematical transformation or simple extensions. For example, if \(T_{jk}\) is a symmetric matrix, and \(\lambda_j\) and \(\mu_k\) are vectors with all positive components, a network connected through a matrix \(S_{jk} = T_{ij} \lambda_j \mu_k\) also has a Lyapunov function.

A broader class of related networks can be generated through using additional ‘fast’ neurons whose inputs and outputs are related in a way that produces an equivalent direct pathway that is symmetric. For example, if an additional neuron receives input \(\Sigma_k A_kV_v\) and has an instantaneous output \(f(\Sigma_k A_kV_v)\ ,\) where \(f\) is the derivative of a monotone function \(F\ ,\) and this in turn has connections that result in a current in cell \(j\) that is \(A_j f(\Sigma_k A_kV_v)\ ,\) the Lyapunov function simply requires the additional term \(F(\Sigma_k A_kV_v)\ .\) Simulations show that it is not necessary that such additional neurons be infinitely fast, but only that they are fast enough that the overall system does not oscillate.

When this additional neural circuitry is inhibitory, it can implement constraints on the overall behavior of the system. \(A_kV_k = B\) describes a plane in activity (\(V\)) space. When \(f\) is a function with a rapid onset, this inhibition can constrain the activity of the network to be on one side of the plane.

Related mathematics has been used to understand simple Hopfield networks with synchronous update, to special time-dependent problems, and to some models of synchronization of action potentials.

Optimizations are common computational problems, and come in many forms. Finding the best straight line through a set of (x, y) data points, linear programming, and finding the shortest Traveling Salesman route to visit a set of cities, are typical examples. Other problems that are not normally thought of as optimizations can often be restated in the language of optimization. For example, a Sudoku puzzle can be described in terms of ‘place the largest number of integers 1-9 into the blank spaces in the puzzle that you can, without violating any of the rules’. The correct solution is the unique pattern which has no blanks left at all, and thus has maximized the number of entries the puzzle-solver has put in.

Many optimization problems can be readily represented on Hopfield nets, by transforming the problem into variables such that the desired optimization corresponds to the minimization of the respective Lyapunov function [Hopfield and Tank 1985] In this representation, the dynamics of change in network state with time takes the system to a local energy minimum. If this local minimum is also the global minimum, the solution of the desired optimization task has been carried out by the convergence of the network state. Indeed, the energy function can be thought of as a programming language for transforming optimization problems into a solution method applying network dynamics. The resulting network could be either built in analog hardware or implemented in software on a digital machine.

Linear programming, the worker assignment problem, and decomposing signals into a basis set can all be solved exactly by Hopfield networks because the Lyapunov function for these problems can be constructed with a single (and thus global) minimum. When more computationally difficult problems are programmed using this approach, the Lyapunov function often has multiple local minima, and the dynamics of the network may converge to a local minimum rather than to the global minimum. Finding a good half-tone image from a gray-scale photograph and the n-queens chess problem can be programmed in this way. How effective such a network can be in finding a good solution is strongly dependent on the problem class.