Cognitive Sciences Stack Exchange is a question and answer site for practitioners, researchers, and students in cognitive science, psychology, neuroscience, and psychiatry. It's 100% free, no registration required.

A classic "Hopfield network" is a type of artificial neural network in which the units are bi-stable and fully interconnected by symmetrically weighted connections. In 1982, Hopfield showed that such networks are characterized by an "energy function", under which stored memories correspond to local energy minima [1].

In a 1983 paper [2], Hopfield et al further showed that "spurious memories" (local energy minima that are created during training, in addition to the intended target patterns) can be suppressed by an "unlearning procedure", during which the network is repeatedly allowed to relax from random states, and the resulting states then "unlearned" by anti-Hebbian weight adjustments. The procedure affects spurious memories more than the desirable "learned memories", thus improving recall performance. However, the paper offers no explanation for why this should be so.

A 2004 paper by Robins and McCallum [3] demonstrates that spurious memories can be distinguished from learned ones because their "energy profiles" are different. Specifically, the ratio of lowest to highest energy contributions from individual units is significantly smaller in states corresponding to spurious memories than in states corresponding to learned memories. Again, the effect is not accounted for (except for a tentative partial explanation).

My questions are:

Is there a relationship between these two findings, i.e. does the lower "energy ratio" of spurious states explain their greater susceptibility to unlearning?

Have any explanations for either or both of these phenomena been put forward since the publication of the papers?

Are there other ways to suppress or detect spurious memories in the Hopfield family of neural networks?

Thanks for asking this question! Minor comment: question 3 is a pretty big question on its own, and not as closely related to question 1 and 2. It might be worthwhile to ask it as a separate question, but it is up to you.
–
Artem Kaznatcheev♦May 18 '12 at 18:14

1 Answer
1

I think that your intuition about the lower "energy ratio" of spurious states explaining their greater susceptibility to unlearning might be correct.

In a Hopfield Network spurious states are activity patterns that have not been explicitly embedded in the synaptic matrix, but are nonetheless stable. They are in other words "unwanted" attractor states that, by virtue of a finite overlap with the "wanted" attractor states, come about as a local minimum in the energy function. The unlearning rule in Hopfield et al. (1983) consists in modifying the synaptic matrix so as to decrease the energy of the stable states in which the network dynamics settles down, be them spurious or embedded states. Because the spurious states have higher energy than the embedded states, they are more strongly affected by the unlearning step.

Now, why do spurious states have higher energy than the embedded attractor states? Well, this is actually not true in general, but it is the case in a regime where the Hopfield network doesn't exceed its loading capacity, that is when the number of learned patterns over number of units $p/N$ is lower than the critical capacity $\alpha_c\approx0.138$. In this regime, it is possible to estimate the overlap of the spurious states with the learned patterns, and show that is generally lower than $1$ (the overlap of the learned patterns with themselves). Because of how the Hebbian construction of the synaptic matrix in the Hopfield model, this overlaps are terms that appears in the energy function. The energy of a pattern is namely proportional to minus the square root of its overlap with the learned patterns. This means that the spurious patterns have higher energy than the learned ones.

In general this kind of naive considerations has to be supported by more rigorous arguments based on probability theory. These for instance point out that even for the regime below $\alpha_c$ the retrieved patterns are actually spurious states as soon as the number of embedded patterns $p$ exceeds above $\frac{N}{2\ln N}$. Such spurious states have however such a high overlap with the learned patterns ($0.97$) that they basically coincide with them.

This result and generalizations thereof for non-zero temperature (i.e. noise in the dynamics) and beyond critical capacity have been worked out in the following very technical paper: