Sign up to receive free email alerts when patent applications with chosen keywords are publishedSIGN UP

Abstract:

This invention is in the field of machine learning and neural associative
memory. In particular the invention discloses a neural associative memory
structure for storing and maintaining associations between memory address
patterns and memory content patterns using a neural network, as well as
methods for storing and retrieving such associations. Bayesian learning
is applied to achieve non-linear learning.

Claims:

1. A neural associative memory designed for maintaining associations
between memory address patterns and memory content patterns, the memory
comprising:a plurality of stored memory address patterns and associated
memory content patterns,a Bayesian probability framework,a neural network
comprising a set of synapses and sets of pre-synaptic and post-synaptic
neurons, the synapses connecting pre-synaptic and post-synaptic
neurons,means for accepting an input query pattern,means for accepting a
noise distribution describing how the input query pattern deviates from a
memory address pattern,means for applying the Bayesian probability
framework for determining a most likely output pattern to the input query
pattern based on the input query pattern, the stored memory address
patterns and associated memory content patterns, and the noise
distribution,means for transforming Bayesian probabilities from the
Bayesian probability framework into the neural network,a means for
optimizing the neural network with respect to the target architecture
chosen for implementation, andoutput means for returning the most likely
output pattern to the input query pattern equal or similar to the memory
content pattern associated with the memory address pattern most similar
to the input query pattern.

3. A method for storing memory address patterns and associated memory
content patterns in a neural associative memory, said method,
comprising:providing a neural associative memory, said memory comprising
a plurality of stored memory address patterns and associated memory
content patterns, a Bayesian probability framework, a neural network
comprising a set of synapses and sets of pre-synaptic and post-synaptic
neurons, the synapses connecting pre-synaptic and post-synaptic neurons,
means for accepting an input query pattern, means for accepting a noise
distribution describing how the input query pattern deviates from a
memory address pattern, means for applying the Bayesian probability
framework for determining a most likely output pattern to the input query
pattern based on the input query pattern, the stored memory address
patterns and associated memory content patterns, and the noise
distribution, means for transforming Bayesian probabilities from the
Bayesian probability framework into the neural network, a means for
optimizing the neural network with respect to the target architecture
chosen for implementation, and output means for returning the most likely
output pattern to the input query pattern equal or similar to the memory
content pattern associated with the memory address pattern most similar
to the input query pattern, said method further comprisingaccepting
pattern pairs of memory address patterns and associated memory content
patterns,storing the association between memory address patterns and
associated memory content patterns within a neural network of the memory
structure, wherein the neural network consists of a set of synapses and
sets of pre-synaptic and post-synaptic neurons, the synapses connecting
pre-synaptic and post-synaptic neurons and wherein data is stored in
values associated with the synapses and neurons, bycomputing a vector of
pre-synaptic unit usages by computing a unit usage for each pre-synaptic
neuron and storing at least a pre-synaptic unit usage value with each
pre-synaptic neuron,computing a vector of post-synaptic unit usages by
computing the unit usage for each post-synaptic neuron and storing at
least a post-synaptic unit usage value with each post-synaptic,
andcomputing a matrix of synapse usages by computing for each synapse
connecting the pre-synaptic neuron to the post-synaptic neuron the
synapse usage and storing at least a synapse usage value with each
synapse.

4. The method of claim 3, whereina matrix of synaptic weights is computed
by computing for each synapse the weight using estimates of query noise,
unit usages and synapse usages,a vector of neuron thresholds is computed
by computing a threshold for each postsynaptic neuron, andthe synaptic
weights are adjusted and neuron thresholds of the neural network based on
the computations.

5. The method of claim 3, whereintwo matrices representing finite and
infinite synaptic weights are computed, where the finite weights neglect
infinite components, whereas infinite weights count the number of
contributions towards plus and minus infinity,two vectors representing
finite and infinite neuron thresholds are computed, andthe finite and
infinite synaptic weights and finite and infinite neuron thresholds of
the neural network are adjusted based on the computations.

6. A method for retrieving a memory content pattern from a neural
associative memory said method comprising:providing a neural associative
memory, said memory comprising a plurality of stored memory address
patterns and associated memory content patterns, a Bayesian probability
framework, a neural network comprising a set of synapses and sets of
pre-synaptic and post-synaptic neurons, the synapses connecting
pre-synaptic and post-synaptic neurons, means for accepting an input
query pattern, means for accepting a noise distribution describing how
the input query pattern deviates from a memory address pattern, means for
applying the Bayesian probability framework for determining a most likely
output pattern to the input query pattern based on the input query
pattern, the stored memory address patterns and associated memory content
patterns, and the noise distribution, means for transforming Bayesian
probabilities from the Bayesian probability framework into the neural
network, a means for optimizing the neural network with respect to the
target architecture chosen for implementation, and output means for
returning the most likely output pattern to the input query pattern equal
or similar to the memory content pattern associated with the memory
address pattern most similar to the input query pattern;storing memory
address patterns and associated memory content patterns in the neural
associative memory, said storing comprisingaccepting pattern pairs of
memory address patterns and associated memory content patterns,storing
the association between memory address patterns and associated memory
content patterns within a neural network of the memory structure, wherein
the neural network consists of a set of synapses and sets of pre-synaptic
and post-synaptic neurons, the synapses connecting pre-synaptic and
post-synaptic neurons and wherein data is stored in values associated
with the synapses and neurons, bycomputing a vector of pre-synaptic unit
usages by computing a unit usage for each pre-synaptic neuron and
storing; at least a pre-synaptic unit usage value with each pre-synaptic
neuron,computing a vector of post-synaptic unit usages by computing the
unit usage for each post-synaptic neuron and storing at least a
post-synaptic unit usage value with each post-synaptic, andcomputing a
matrix of synapse usages by computing for each synapse connecting the
pre-synaptic neuron to the post-synaptic neuron the synapse usage and
storing at least a synapse usage value with each synapse; said method
further comprisingaccepting an input query pattern,accepting a noise
distribution describing how the input query pattern deviates from a
memory address pattern,applying a Bayesian probability framework for
determining a most likely output pattern to the input query pattern based
on the input query pattern, the stored memory address patterns and
associated memory content patterns, and the noise
distribution,transforming Bayesian probabilities from the Bayesian
probability framework into a neural network consisting of a set of
synapses and sets of pre-synaptic and post-synaptic neurons, the synapses
connecting pre-synaptic and post-synaptic neurons,optimizing the neural
network with respect to the target architecture chosen for
implementation, andreturning the most likely output pattern to the input
query pattern equal or similar to the memory content pattern associated
with the memory address pattern most similar to the input query pattern.

7. The method of claim 6, whereina first vector of first dendritic
potentials of unit usages and synapse usages is computed by computing a
dendritic potential for each post-synaptic neuron,a post-synaptic neuron
is activated, andthe output pattern is returned based on the activation
of the post-synaptic neurons.

8. The method of claim 7, whereinthe post-synaptic neuron is activated if
the dendritic potential for the neuron is equal to or larger than zero.

9. The method of claim 7, whereinthe post-synaptic neuron is activated if
the dendritic potential is equal to or larger than a threshold.

10. The method of claim 6, whereinan additional second vector of second
dendritic potentials is computed by computing a second dendritic
potential for each post-synaptic neuron,each neuron is assigned a first
and a second threshold, andthe post-synaptic neuron is activated if
either the second dendritic potential is equal to or larger than the
second threshold and if its first dendritic potential is equal to the
first neuron threshold.

11. The method of claim 6, wherein the vectors of dendritic potentials are
computed on-the-fly of unit usages and synapse usages.

12. The method of claim 6, wherein the input query pattern is a noise
tainted version of one of the memory address pattern.

13. A robot equipped comprising a computing unit and the memory of claim
1.

14. A robot provided with computing means for executing the method of
claim 6.

15. A land, air or sea vehicle comprising the memory of claim 1.

16. A land, air or sea vehicle provided with computing means for executing
the method of claim 6.

Description:

[0001]This invention relates to the field of machine learning and neural
associative memory. In particular the invention discloses a neural
associative memory for storing and maintaining associations between
memory address patterns and memory content patterns using a neural
network, as well as methods for storing and retrieving such associations.
Bayesian learning is applied to achieve non-linear learning.

[0002]The inventive neural associative memory is designed to store
associations between memory address patterns and memory content patterns
in the neural network, i.e. in a network of neurons and synaptic
connections, for example in a set of synaptic weights between the neurons
and also other properties and values of the neural network. Neural
networks are applicable in all areas where pattern (and/or sequence)
recognition is needed to identify a specific patterns, situations or to
process information derived from observations made by machinery such as
robots, autonomous vehicles or systems designed to assist a human
operators--especially where the complexity of the data or a task renders
an implementation of functions by hand impractical.

[0003]A neural network can generally be used to infer functions from
observations as neural networks allow to work with no or only little a
priori knowledge on the problem to be solved and also allows to provide
for a failure tolerant behavior. Problems that may be addressed, may
relate to system identification and control (vehicle control, process
control), game-playing and decision making, machine vision and pattern
recognition (facial recognition, object recognition, gesture recognition,
speech recognition, (handwritten) character and text recognition),
medical diagnosis, financial applications (automated trading systems),
data mining (or knowledge discovery) and visualization.

[0004]Using the advantages of neural networks the neural associative
memory structure accepts an input signal or input query pattern as a
memory address pattern, which may be tainted with noise, and derives an
output signal or output pattern that is identical or similar to the
memory content pattern associated with the memory address pattern
obtained from the input signal or input query pattern. The input signal
or input query pattern may be accepted by one or more sensors, e.g. for
visual or acoustic input. In the following, only the terms input query
pattern and output pattern are used. The output pattern may be output
through an software or hardware interface or may be transferred to
another processing unit.

[0005]In addition, the invention combines the advantages of neural
networks with Bayesian learning principles including estimates of query
component error probabilities applied to provide a non-linear learning
method. Computations and transformations required by this application as
well those necessary for maintaining, adjusting and training the neural
network may be performed by a processing means such as one or more
processors (CPUs), signal processing units or other calculation,
processing or computational hardware and software, which might also be
adopted for parallel processing. All the processing and computations may
be performed on standard of the shelf hardware or specially designed
hardware components or specific hardware which may be adapted for
parallel processing.

[0006]Some portions of the detailed description that follows are presented
in terms of algorithms and symbolic representations of operations on data
bits within a computer. These algorithmic descriptions and
representations are the means used by those skilled in the data
processing arts to most effectively convey the substance of their work to
others skilled in the art. An algorithm is here, and generally, conceived
to be a self-consistent sequence of steps (instructions) leading to a
desired result. The steps are those requiring physical manipulations of
physical quantities. Usually, though not necessarily, these quantities
take the form of electrical, magnetic or optical signals capable of being
stored, transferred, combined, compared and otherwise manipulated. It is
convenient at times, principally for reasons of common usage, to refer to
these signals as bits, values, elements, symbols, characters, terms,
numbers, or the like. Furthermore, it is also convenient at times, to
refer to certain arrangements of steps requiring physical manipulations
of physical quantities as modules or code devices, without loss of
generality.

[0007]However, all of these and similar terms are to be associated with
the appropriate physical quantities and are merely convenient labels
applied to these quantities. Unless specifically stated otherwise as
apparent from the following discussion, it is appreciated that throughout
the description, discussions utilizing terms such as "processing" or
"computing" or "calculating" or "determining" or "displaying" or
"determining" or the like, refer to the action and processes of a
computer system, or similar electronic computing device, that manipulates
and transforms data represented as physical (electronic) quantities
within the computer system memories or registers or other such
information storage, transmission or display devices.

[0008]Certain aspects of the present invention include process steps and
instructions described herein in the form of an algorithm. It should be
noted that the process steps and instructions of the present invention
could be embodied in software, firmware or hardware, and when embodied in
software, could be downloaded to reside on and be operated from different
platforms used by a variety of operating systems.

[0009]The present invention also relates to an apparatus for performing
the operations herein. This apparatus may be specially constructed for
the required purposes, or it may comprise a general-purpose computer
selectively activated or reconfigured by a computer program stored in the
computer. Such a computer program may be stored in a computer readable
storage medium, such as, but is not limited to, any type of disk
including floppy disks, optical disks, CD-ROMs, magnetic-optical disks,
read-only memories (ROMs), random access memories (RAMs), EPROMs,
EEPROMs, magnetic or optical cards, application specific integrated
circuits (ASICs), or any type of media suitable for storing electronic
instructions, and each coupled to a computer system bus. Furthermore, the
computers referred to in the specification may include a single processor
or may be architectures employing multiple processor designs for
increased computing capability.

[0010]The algorithms and displays presented herein are not inherently
related to any particular computer or other apparatus. Various
general-purpose systems may also be used with programs in accordance with
the teachings herein, or it may prove convenient to construct more
specialized apparatus to perform the required method steps. The required
structure for a variety of these systems will appear from the description
below. In addition, the present invention is not described with reference
to any particular programming language. It will be appreciated that a
variety of programming languages may be used to implement the teachings
of the present invention as described herein, and any references below to
specific languages are provided for disclosure of enablement and best
mode of the present invention.

[0011]In addition, the language used in the specification has been
principally selected for readability and instructional purposes, and may
not have been selected to delineate or circumscribe the inventive subject
matter. Accordingly, the disclosure of the present invention is intended
to be illustrative, but not limiting, of the scope of the invention,
which is set forth in the claims.

[0013]A neural network thereby consists of a set of neurons and a set of
synapses. The synapses connect neurons and store information in
parameters called weights, which are used in transformations performed by
the neural network and learning processes.

[0014]Typically, an input signal or input pattern is accepted from a
sensor, which is processed using the neural networks implemented by
hardware units and software components. An output signal or output
pattern is obtained, which may serve as input to other systems for
further processing, e.g. for visualization purposes. The input signal may
be supplied by one or more sensors, e.g. for visual or acoustic sensing,
but also by a software or hardware interface. The output pattern may as
well be output through a software and/or hardware interface or may be
transferred to another processing unit or actor, which may be used to
influence the actions or behavior of a robot or vehicle.

[0015]Computations and transformations required by the invention and the
application of neural networks as well as those necessary for
maintaining, adjusting and training the neural network, may be performed
by a processing means such as one or more processors (CPUs), signal
processing units or other calculation, processing or computational
hardware and/or software, which might also be adapted for parallel
processing. Processing and computations may be performed on standard off
the shelf (OTS) hardware or specially designed hardware components. A CPU
of a processor may perform the calculations and may include a main memory
(RAM, ROM), a control unit, and an arithmetic logic unit (ALU). It may
also address a specialized graphic processor, which may provide dedicated
memory and processing capabilities for handling the computations needed.

[0016]A neural network is configured such that the application of an input
pattern or a set of input patterns produces (either `direct` or via a
relaxation process) a set of (desired) output patterns. Various methods
to set strengths/weights of synaptic connections between neurons of the
neural network exist. One way, which is not an object of the invention,
is to set the weights explicitly, using a priori knowledge. Another way
is to `train` the neural network by feeding it teaching patterns and
letting it change its weights (learning) according to some learning
rule/algorithm.

[0017]In particular, the method described by the invention may be provided
as a software program product on a (e.g., portable) physical storage
medium which may be used to transfer the program product to a processing
system or a computing device in order to instruct the system or device to
perform a method according to this invention. Furthermore, the method may
be directly implemented on a computing device or may be provided in
combination with the computing device.

[0018]Further, the invention can also be applied in various domains, one
of them being robotics but as well applied in systems for ground, water
and/or air bound vehicles, including systems designed to assist a human
operator. The method and/or system disclosed herein in general may be
used whenever a technical (e.g., an electronic) system is required to
autonomously learn characteristics and/or properties of objects (e.g.,
size, distance, relative/absolute position also to other objects, spatial
alignment, relative movement, speed and/or direction and other related
object features or feature patterns) which are presented to the system.

BACKGROUND OF THE INVENTION

[0019]In the classical von Neumann computing architecture, computation and
data storage is performed by separate modules, the central processing
unit and the random access memory, respectively (cf., e.g., A. W. Burks,
H. H. Goldstine, and J. von Neumann. Preliminary discussion of the
logical design of an electronic computing instrument. Report 1946, U.S.
Army Ordonance Department, 1946). A memory address sent to the random
access memory gives access to the data content of one particular storage
location. Associative memory structures are computing architectures in
which computation and data storage is not separated (cf. T. Kohonen.
Associative memory: a system theoretic approach. Springer, Berlin, 1977).
For example, an associative memory can store a set of associations
between pairs of (binary) patterns {(u.sup.μ→v.sup.μ):
μ=1, . . . , M} (see FIGS. 1a and 1b).

[0020]FIG. 1a illustrates the memory tasks. In the storage phase, M
associations of memory address patterns u.sup.μ and content patterns
v.sup.μ are stored in the associative memory device (AM) as shown
exemplarily in FIG. 1a (learning of associations between M pattern pairs,
u.sup.μ→v.sup.μ). In the retrieval phase, the AM is
addressed with an input query pattern typically resembling one of the
previously stored memory address patterns u.sup.μ1. The AM
returns the retrieval result {circumflex over (v)} that should be similar
to the associated memory content pattern v.sup.μ1, as shown
exemplarily in FIG. 1b (retrieving phase).

[0021]Similar as in random access memory, a query pattern u.sup.μ
entered in the associative memory can serve as address for accessing the
associated pattern v.sup.μ. However, the tasks performed by the two
types of memory differ fundamentally. Random access is only defined for
query patterns that are valid addresses, that is, for the set of u
patterns used during storage. The random access task consists of
returning the data record at the addressed location (look-up).

[0022]In contrast, associative memory structures accept arbitrary input
query patterns and the computation of any particular output involves
all stored data records rather than a single one. Specifically, the
associative memory task consists of comparing an input query pattern
with all stored addresses and returning an output pattern equal (or
similar) to the pattern v.sup.μ associated with the memory address
pattern u.sup.μ most similar to the input query pattern. Thus, the
associative memory task includes the random access task but is not
restricted to it. It also includes computations such as pattern
completion, denoising or data retrieval using incomplete cues.

[0023]Neural associative memory structures are parallel implementations of
associative memory in a network of neurons in which associations are
stored in a set of synaptic weights W between neurons typically employing
fast Hebbian-type learning methods (cf., e.g., J. Hertz, A. Krogh, and R.
G. Palmer. Introduction to the theory of neural computation.
Addison-Wesley, Redwood City, 1991). Associative networks are closely
related to Hebbian cell assemblies (cf., e.g., G. Palm. Neural
Assemblies. An Alternative Approach to Artificial Intelligence. Springer,
Berlin, 1982) and play an important role in neuroscience as models of
neural computation for various brain structures, in particular neocortex,
hippocampus, and cerebellum.

STATE OF THE ART

[0024]Most two-layer neural associative memory models can be assigned to
one of the following two classes. The simplest model of neural
associative memory is the so-called Steinbuch or Willshaw model with
binary synapses and clipped Hebbian learning (cf., e.g., A. Knoblauch, G.
Palm, and F. T. Sommer. Memory capacities for synaptic and structural
plasticity. Neural Computation, 2009; K. Steinbuch. Die Lernmatrix.
Kybernetik, 1:36-45, 1961; D. J. Willshaw, O. P. Buneman, and H. C.
Longuet-Higgins. Non-holographic associative memory. Nature, 222:960-962,
1969). Here a single coincidence of presynaptic and postsynaptic activity
is sufficient to increase the synaptic weight from 0 to 1, while further
coincidences do not cause further changes.

[0026]The maximal storage capacity C is almost identical for the two
models: The Willshaw model can achieve up to 0.69 bits per synapse, while
the linear learning model can achieve a slightly higher capacity of 0.72
bits per synapse (bps) (although the synapses may have gradual weights
and thus need much more physical memory to be represented). Closer
investigations reveal that the Willshaw model can achieve non-zero
capacity only for very sparse patterns where the number of active units
per pattern vector scales logarithmic with the vector size. In contrast,
the linear model is believed to achieve the maximum C=0.72 bps for almost
arbitrary sparseness. Only for linearly or non-sparse patterns
performance drops to the capacity of the Hopfield model (C=0.14 bps, cf.,
e.g., J. J. Hopfield. Neural networks and physical systems with emergent
collective computational abilities. Proceedings of the National Academy
of Science, USA, 79:2554-2558, 1982). In any case, the linear learning
model achieves maximal storage capacity only for the optimal covariance
learning rule (e.g., see, e.g., G. Palm and F. Sommer. Associative data
storage and retrieval in neural nets. In E. Domany, J. L. van Hemmen, and
K. Schulten, editors, Models of Neural Networks III, pages 79-118.
Springer-Verlag, New York, 1996) which becomes equal to the Hebb rule for
very sparse patterns, and equal to the Hopfield rule for non-sparse
patterns. Moreover, the capacity that can actually be achieved in finite
networks is well below that of the Willshaw model (e.g., C=0.2 bps vs.
C=0.5 bps for n=105 neurons; see, e.g. A. Knoblauch. Neural
associative networks with incremental learning rules. HRI-ED Report
08-03, Honda Research Institute Europe GmbH, D-63073 Offenbach/Main,
Germany, May 2008). The performances of different models of neural
associative memory are summarized by the Table shown in FIG. 2.

[0027]The table pictured in FIG. 2 shows a comparison of different
associative memory (AM) models with respect to the following performance
measures. The pattern capacity M measures the maximal number of stored
memories. The network capacity C measures the maximal information a
synapse can store in a structurally static network. The information
capacity CI measures the maximally stored information per computer
bit in a digitally compressed representation. Finally, the synaptic
capacity CS measures the maximal information a synapse can store in
a structurally plastic network assuming that irrelevant synapses can be
pruned. The Linear AM achieves maximal M and C in the asymptotic limit of
very large networks but only low CI and CS. The (excitatory)
Willshaw AM has always low performance unless the memories are extremely
sparse. The Inhibitory WAM has low M and C but achieves maximal CI
and CS even for moderate sparseness. The novel Bayes AM achieves
maximal M and C even for finite networks but only low CI and
CS. The novel Zip AM achieves maximal or near maximal performance
for all measures.

[0028]A well known problem of these two-layer approaches is that the high
theoretical capacities can be reached only under some artificial
assumptions. For example, most theories assume randomly generated memory
patterns, where each pattern component, e.g., ui.sup.μ, is
generated independently of other components. In such a setting the memory
address patterns are uniformly distributed in the pattern space. Another
assumption often employed by these models is that the pattern activities
k.sup.μ and l.sup.μ have a low variance, for example constant
k.sup.μ=k and l.sup.μ=l. However, for real-world technical
applications (and very likely also for the brain), these assumptions are
invalid: Memories commonly cluster in a complex way in the memory space,
and pattern activities are often broadly distributed. Such realistic
conditions can strongly decrease storage capacity C and increase output
noise ε in these previous memory systems. Although, due to its
two-layer structure, also the current invention cannot solve these
problems in principle way, numerical experiments have revealed that the
current invention is much more robust against "correlated" patterns and
broadly distributed pattern activities.

[0029]Actually, the two-layer memory models can be used as building blocks
to implement larger systems with a more complex hierarchical structure.
For example, some brain theories consider the brain as a complex network
of interconnected associative memories (cf., e.g., G. Palm. Neural
Assemblies. An Alternative Approach to Artificial Intelligence. Springer,

Berlin, 1982). For technical applications at least three layer networks
are of interest because of well known limitations of two-layer networks
(which cannot compute XOR functions, for example). One possible strategy
is to map each memory address patterns u.sup.μ into a high-dimensional
space w and then associate the corresponding patterns w.sup.μ with the
memory content patterns vu. By this procedure different memory content
patterns v.sup.μ1 and v.sup.μ2 can be associated with
similar memory address patterns u.sup.μ1≈u.sup.μ2 and,
thus, the problems of storing "correlated" memories and storing memories
with broadly distributed pattern activities (as described above) become
tractable. For example, previously a three layer system has been
described (cf., e.g., P. Kanerva. Sparse Distributed Memory. MIT Press,
Cambridge, Mass., 1988) where, in the first stage, the address memories
are de-correlated by a random projection.

[0030]Similarly, in EP 07 110 870, a four-layer memory system is
described, where the intermediary patterns w.sup.μ are systematically
chosen in order to minimize output noise. The current invention could be
used, for example, in such multi-layer systems as building blocks,
improving memory performance by replacing the previously employed
Willshaw, Hopfield, or random networks.

[0031]The document "Bayesian Retrieval in Associative Memories with
Storage Errors" by F. T. Sommer and P. Dayan (IEEE Transactions On Neural
Networks, Vol. 9, No. 4, July 1998) describes how iterative retrieval
strategies emerge naturally from considerations of probabilistic
inference under conditions of noisy and partial input and a corrupted
weight matrix. Start from a conditional probability distribution over
possible patterns for retrieval the described method contains information
available to an observer of a network. Since the distribution is over
exponentially many patterns, it is used to develop approximate, but
tractable, iterative retrieval methods. One performs maximum likelihood
inference to find the single most likely pattern, using the (negative log
of the) conditional probability as a Lyapunov function for retrieval. The
second method makes a mean field assumption to optimize a tractable
estimate of the full conditional probability distribution. In the absence
of storage errors, both models are very similar to the Willshaw model,
where standard retrieval is iterated using a particular form of linear
threshold strategy.

[0032]However, Sommer and Dayan only optimize retrieval by Bayesian
methods, but not learning. In fact, as can be seen from eq. 1 of the
document by Sommer and Dayan, the matrix of synaptic weights is binary
and learning is identical to the well-known Willshaw model (Willshaw et
al., 1969). Thus, they implement (or rather approximate) optimal Bayesian
retrieval given the binary weight matrix of the Willshaw model.

[0033]In contrast, the inventive network implements optimal Bayesian
learning and retrieval given the counter variables (M1, M1', M11) defined
below. In particular, the resulting synaptic weights of our network are
real-valued and differ from the binary weights computed by the model of
Sommer and Dayan. Therefore, the network presented herein will achieve a
much higher performance (i.e., lower output noise, higher capacity).

[0034]Second, Sommer and Dayan employ iterated retrieval in a recurrent
(auto-associative) network according to equations 2, 23, and 34 of the
document in order to approximate optimal Bayesian retrieval. In contrast,
the network presented herein implements optimal Bayesian retrieval in a
single read-out step focusing on a feed-forward (hetero-associative)
scenario, although it can be applied also to auto-association.

[0036]This object is achieved by means of the features of the independent
claims. The dependent claims develop further the central idea of the
invention.

[0037]The invention therefore provides neural associative memory structure
for maintaining associations between memory address patterns and memory
content patterns, the memory structure comprising a Bayesian probability
framework, a neural network consisting of a set of synapses and sets of
pre-synaptic and post-synaptic neurons, the synapses connecting
pre-synaptic and post-synaptic neurons, an accepting means for accepting
an input query pattern, a processing means for applying the Bayesian
probability framework for determining a most likely output pattern to the
input query pattern based on the input query pattern, stored memory
address patterns and associated memory content patterns, and a noise
distribution describing how the input query pattern deviates from a
memory address pattern, for transforming Bayesian probabilities from the
Bayesian probability framework into the neural network, and for
optimizing the neural network with respect to the target architecture
chosen for implementation, and an output means for returning the most
likely output pattern to the input query pattern equal or similar to the
memory content pattern associated with the memory address pattern most
similar to the input query pattern.

[0038]The accepting means for accepting an input query pattern may be a
sensor.

[0039]The processing means may provide one or more processors, signal
processing units and/or other calculation, processing and/or
computational hardware and software and may be adopted for parallel
processing.

[0040]The output means may be a hardware or software interface.

[0041]In another aspect of the invention, a method for storing memory
address patterns and associated memory content patterns in the neural
associative memory structure is provided, comprising the steps of storing
the association between memory address patterns and associated memory
content patterns within a neural network of the memory structure, wherein
the neural network consists of a set of synapses and sets of pre-synaptic
and post-synaptic neurons, the synapses connecting pre-synaptic and
post-synaptic neurons and wherein data is stored in values associated
with the synapses and neurons, by computing a vector of pre-synaptic unit
usages by computing a unit usage for each pre-synaptic neuron and storing
at least a pre-synaptic unit usage value with each pre-synaptic neuron,
computing a vector of post-synaptic unit usages by computing the unit
usage for each post-synaptic neuron and storing at least a post-synaptic
unit usage value with each post-synaptic, and computing a matrix of
synapse usages by computing for each synapse connecting the pre-synaptic
neuron to the post-synaptic neuron the synapse usage and storing at least
a synapse usage value with each synapse.

[0042]In another aspect of the invention, a matrix of synaptic weights can
be computed by computing for each synapse the weight using estimates of
query noise, unit usages and synapse usages, a vector of neuron
thresholds can be computed by computing a threshold for each postsynaptic
neuron, and the synaptic weights and neuron thresholds of the neural
network may be adjusted based on the computations.

[0043]In yet another aspect of the invention, two matrices representing
finite and infinite synaptic weights can be computed, where the finite
weights neglect infinite components, whereas infinite weights count the
number of contributions towards plus and minus infinity, two vectors
representing finite and infinite neuron thresholds may be computed, and
the finite and infinite synaptic weights and finite and infinite neuron
thresholds of the neural network may be adjusted based on the
computations.

[0044]The computations and/or adjustments can be performed by a processing
means, which provides one or more processors, signal processing units
and/or other calculation, processing and/or computational hardware and
software.

[0045]According to a further aspect of the invention, a method for
retrieving a memory content pattern from the neural associative memory is
provided, comprising the steps of accepting an input query pattern by an
accepting means, applying a Bayesian probability framework for
determining a most likely output pattern to the input query pattern based
on the input query pattern, stored memory address patterns and associated
memory content patterns, and a noise distribution describing how the
input query pattern deviates from a memory address pattern, transforming
Bayesian probabilities from the Bayesian probability framework into a
neural network consisting of a set of synapses and sets of pre-synaptic
and post-synaptic neurons, the synapses connecting pre-synaptic and
post-synaptic neurons, and optimizing the neural network with respect to
the target architecture chosen for implementation, by a processing means
and returning the most likely output pattern to the input query pattern
equal or similar to the memory content pattern associated with the memory
address pattern most similar to the input query pattern through a output
means.

[0046]A first vector of first dendritic potentials of unit usages and
synapse usages may be computed by computing a dendritic potential for
each post-synaptic neuron, a post-synaptic neuron can be activated, and
the output pattern may be returned based on the activation of the
post-synaptic neurons.

[0047]Also, the post-synaptic neuron may be activated if the dendritic
potential for the neuron is equal to or larger than zero.

[0048]Moreover, the post-synaptic neuron is activated if the dendritic
potential is equal to or larger than a threshold.

[0049]Furthermore, an additional second vector of second dendritic
potentials may be computed by computing a second dendritic potential for
each post-synaptic neuron, each neuron may be assigned a first and a
second threshold, and the post-synaptic neuron is activated if either the
second dendritic potential is equal to or larger than the second
threshold and if its first dendritic potential is equal to the first
neuron threshold.

[0050]The processing means is also used for the computations and/or
adjustments and the processing means may provides one or more processors,
signal processing units and/or other calculation, processing and/or
computational hardware and software.

[0051]In even a further aspect of the invention, n the vectors of
dendritic ptentials are computed on-the-fly of unit usages and synapse
usages.

[0052]In another aspect of the invention, the input query pattern is a
noise tainted version of one of the memory address pattern.

BRIEF DESCRIPTION OF THE DRAWINGS

[0053]FIG. 1a illustrates a storage phase of memory task.

[0054]FIG. 1b illustrates a retrieval phase of the memory task.

[0055]FIG. 2 pictures a table with a comparison of different associative
memory (AM) models.

[0057]FIG. 4 illustrates a neuron and synapse model according to the
current invention.

[0058]FIG. 5 illustrates a four-layer system for information retrieval
according to one embodiment of the invention.

[0059]FIG. 6 shows a block diagram of a system for visual object
recognition.

DETAILED DESCRIPTION OF THE INVENTION

[0060]Neural associative memory networks as considered by this invention
are single layer neural networks or perceptrons with fast "one-shot"
learning corresponding to the storage of M discrete associations between
pairs of binary pattern vectors {(u.sup.μ→v.sup.μ):μ=1, .
. . , M}. Here u.sup.μ is the μ-th memory address pattern being a
binary vector of size m. Similarly, v.sup.μ is the μ-th memory
content pattern being a binary vector of size n. Further define the
pattern activities

k μ := i = 1 m u i μ and l μ :=
j = 1 n v j μ ##EQU00001##

are defined as the number of one-entries in the μ-th memory address and
memory content pattern, respectively. Finally, k:=E.sub.μ (k.sup.μ)
and l:=E.sub.μ (l.sup.μ) denote the average pattern activities.

[0061]The "one-shot" constraint restricts the set of possible learning
methods. For example, gradient descent methods (as error backpropagation)
are not viable because they require repeated training of the whole
pattern set. Instead it is straight-forward to use simple Hebbian-like
learning rules:

[0062]If, during presentation of a single pattern pair, both the
presynaptic and postsynaptic neurons are active, then the synaptic weight
must be increased.

between retrieval result {circumflex over (v)} and original memory content
pattern v.sup.μ normalized by the mean content pattern activity l,

:= d H ( v ^ , v μ ) l . ##EQU00004##

[0065]The goal is to maximize C and minimize ε. In contrast to
previous solutions, the system described by this invention, under some
assumptions, maximizes C and minimizes ε. Many previous memory
systems worked well only under artificial conditions, for example,
presuming randomly generated "uncorrelated" memory address patterns
u.sup.μ with independently generated pattern components, or assuming
narrowly distributed pattern activities k.sup.μ (for example constant
k.sup.μ=k). Here numerical simulations have revealed that the current
invention is much more robust against "correlated" patterns and broadly
distributed pattern activities. Further experiments have also shown that
the current invention works much better than the previous approaches for
"pattern part retrieval", i.e., when the set of active units in the input
query patterns are a subset of the active units in the original memory
address patterns u.sup.μ, briefly .OR right.u.sup.μ. Pattern part
retrieval is particularly important for spiking implementations where the
most reliable units fire before the less reliable units. Here, at least
in an early phase of retrieval, the pattern part assumption .OR
right.u.sup.μ is fulfilled with high probability, and the current
invention promises significantly improved performance.

[0066]FIG. 3, for example, shows the NAM considered by the present
invention, which is a two-layer neural network consisting of an address
population u (size m) and a content population v (size n). An address
neuron ui can make synaptic contacts with weight onto content neuron
vj. When addressing with a input query pattern a content neuron
vj gets active if the dendritic potential

x j := i = 1 m w ij u ~ i ##EQU00005##

exceeds the neuron's firing threshold θj. Memory associations
are stored in the synaptic weights and firing thresholds of the network.
FIG. 3 also shows an example of a hetero-associative memory. For
identical u and v the network becomes an auto-associative memory with
recurrent synaptic connections.

Neuron and Synapse Model

[0067]FIG. 4 illustrates the neuron and synapse model according to the
current invention. Each neuron j has a number of state variables: By
M1(j) the "unit usage" counting the number of active memory
components during the memory storage phase (see below) is denoted.
Similarly, M0(j) counts the occurrences of inactive memory
components. Then, more similar to previous models, each neuron has a
continuously valued dendritic potential x(j) and a continuously valued
spike threshold θ(j) which are determined dynamically during
retrieval depending on the previously stored memories and the current
input query pattern . In some implementations the neuron has also two
additional integer variables x.sup.∞(j) and θ.sup.∞(j)
counting "infinite components" of dendritic potentials and spike
thresholds, respectively. Furthermore, each neuron j has two variables
e01(j) and e10(j) estimating the "error probabilities". Here
e01(j) estimates the probability that neuron j is active when it
should be inactive. Similarly, e10(j) estimates the probability that
neuron j is inactive when it should be active.

[0068]Each synapse ij connecting neuron i to neuron j has the following
state variables: By M11(ij) the "synapse usage" counting the number
of co-activations of presynaptic neuron i and postsynaptic neuron j
during the memory storage phase (see below) is denoted. Similarly,
M10 counts the storage events where the presynaptic neuron is active
and the postsynaptic neuron is inactive. Similarly, M00 counts the
storage events where the presynaptic neuron is inactive and the
postsynaptic neuron is active. Similarly, M00 counts the storage
events where both presynaptic and postsynaptic neurons are inactive.
Then, more similar to the previous models, each synapse has a
continuously valued synaptic weight w(ij). In some implementations each
synapse additionally has a binary valued weight wij.sup.∞
counting "infinite components" of the synaptic weight.

[0069]The left panel of FIG. 4 shows that information about memory
associations u.sup.μ→v.sup.μ is stored in neurons and
synaptic connections. Each presynaptic neuron i stores its unit usages
M'1(i) and M'0(i). Each postsynaptic neuron j stores its unit
usages M1(j) and M0(j). Each synapse connecting neuron i to
neuron j stores its synapse usages M11(ij), M10(ij),
M01(ij), and M00(ij).

[0070]The right panel of FIG. 4 shows that for retrieval of information
the unit and synapse usages can be transformed to synaptic weights
wij and firing thresholds θj assuming some query error
estimates e01 (i) and e10(i). Synaptic inputs following the
activation of an input query pattern are summed in the dendritic
potential xj and the corresponding output neuron becomes active
{circumflex over (v)}j=1, if the dendritic potential exceeds the
firing threshold. An adequate handling of infinite weights and thresholds
requires additional variables wij.sup.∞, xj.sup.∞
and θj.sup.∞ discussed below.

where i=1, . . . , m and j=1, n. Note that it is actually sufficient to
memorize M, M1, M'1, and M11. This means, the variables
M0, M'0, M10, M01, and M00 must not necessarily
be implemented. Instead, each neuron requires access to M. Therefore, an
implementation on a digital computer requires only about (mn+m+n+1)ld M
memory bits.

Retrieval

[0073]Given an input query pattern the memory task is to find the "most
similar" memory address pattern u.sup.μ and return a reconstruction
{circumflex over (v)} of the associated memory content pattern
v.sup.μ. For this let us assume that the input query pattern u is a
noisy version of u.sub.μ with estimated independent component error
probabilities

e01(i):=pr[ i=1|ui.sup.μ=0]

e10(i):=pr[ i=0|ui.sup.μ=1]

[0074]Now the content neurons j have to decide independently of each other
whether to be activated or to remain silent. Given the input query
pattern , the optimal maximum-likelihood decision

between original and reconstructed content and, thus, output noise
ε. If the input query pattern components are conditional
independent given the activity of content neuron j, e.g., assuming
independently generated memory address pattern components, for a
ε{0, 1} there is

where a content neuron fires, {circumflex over (v)}=1, if the dendritic
potential xj exceeds the firing threshold,
xj≧θj. Note that indices of M0 (j) M1
(j) e01(i), e10(i), M00(ij) M01(ij), M10(ij),
M11(ij) are skipped for the sake of readability.

Practical Aspects for an Implementation

[0076]The previous sections describe an efficient implementation of the
optimal neural associative network model based on Bayesian probabilistic
principles constituting a Bayesian probability framework. There are a
number of important aspects for a practical implementation (see, e.g., A.
Knoblauch. Neural associative networks with optimal Bayesian learning.
HRI-ED Report 09-02, Honda Research Institute Europe GmbH, D-63073
Offenbach/Main, Germany, May 2009 for more details):

[0077]Note that the neural network formulation equation 3 is much cheaper
(in terms of required computation steps) than equation 1, in particular
for sparse queries having only a small number of active components with
i=1. However, the synaptic weights equation 2 may not yet satisfy
Dale's law that a neuron is either excitatory or inhibitory. To have only
positive synaptic weights (which may be more easily to implement and
which is more consistent with biology) a sufficiently large constant
c:=-min ij wij may be added to each weight. Then all synapses
have non-negative weights w'ij:=wij+c and the dendritic
potentials remain unchanged if the last sum in equation 3 is replaced by

[0080]Thus, for e01→0 it is not necessary to recompute the
synaptic weights whenever the expected error probabilities change.

[0081]The synaptic weights wij can become plus or minus infinity
dependent on the synapse usages: For example, in equation 5 the synaptic
weight will become plus infinity if M10=0 or M01=0. Similarly,
the synaptic weight will become minus infinity for M11=0 or
M00=0.

[0082]Similar is true for the firing thresholds θj (see
equation 4). However, a closer analysis (going back to equation 1)
reveals that naive implementations of infinite synaptic weights and
infinite firing thresholds are not adequate and lead to suboptimal
performance. Instead it is necessary to let the positive and negative
infinite components cancel each other. To account for this a neuron model
has been developed where each synaptic weight and each neuron threshold
is represented by two numbers representing finite and infinite
contributions (see FIG. 4). With this model the synaptic weights and
firing thresholds of the optimal associative memory compute as follows:
[0083]Compute two matrices representing finite and infinite synaptic
weights wij and wij.sup.∞: For

with the gating functions F(x)=x for x>0 and F(x)=1 for x≦0, and
G(x)=0 for x>0 and G(x)=1 for x≦0. Thus, wij represents
the finite weight neglecting infinite components, whereas
wij.sup.∞ counts the number of contributions towards plus and
minus infinity. [0085]Compute two vectors representing finite and
infinite neuron thresholds θ(j) and θ.sup.∞(j):
[0086]For

[0092]Note that thus there are three ways to implement the optimal
associative memory leading to different storage and computation
requirements. [0093]The first way is to store only the unit and synapse
usages as described above. This requires to store only m+n+mn integers
each of size log2 M bits. However, this method requires more
computation time because it is necessary, for each input query pattern,
to recompute the synaptic weights and firing thresholds or,
alternatively, to evaluate equation 1. This method may be advantageous if
the error estimates e01 and e10 are quickly changing such that
synaptic weights would have to be recomputed anyway. [0094]The second way
is to store the synaptic weights and firing thresholds as described
above. A naive implementation will require to store n+mn floating point
values. Correspondingly, a retrieval takes only zn+n steps where z:=101
is the number of one-entries in the query vector. [0095]The third way is
to account for infinite weights and thresholds as described above. Then
storage requires n+mn floating point values and additional mn integers of
size log2 5≦3 bits and n integers of size log2 2 m bits.

[0096]Also note that instead of applying fixed optimal thresholds
alternatively an 1-winner-takes-all activation can be used if the number
of active units l in a memory pattern (e.g., if l.sup.μ is constant)
is known.

[0097]So far the invention describes a hetero-associative memory which
corresponds to a feedforward network between two distinct neuron
populations u and v (see FIG. 2). If u and v are identical the networks
becomes a recurrently connected auto-associative memory performing
pattern completion. The invention applies also to the auto-associative
case. Note that here the optimal Bayesian synaptic weights are generally
asymmetric, wij≠wji. This is in contrast to both Hopfield
and Willshaw-type networks. This is also in contrast to theoretical
stability conditions based on statistical mechanics. Symmetric weights
are obtained only in the asymptotic limit when Bayesian learning becomes
equivalent to the linear covariance rule (see above) or if one assumes
zero add-noise, e01=0.

Core of the Invention

[0098]The core idea of this invention is to improve learning in neural
associative memory structures by applying Bayesian learning principles
including estimates of input query component error probabilities. This
leads to a novel non-linear learning method given by equations 2 and 5.
In practice, an implementation of the corresponding optimal memory system
requires the storage of unit usages (e.g., M1(j)) and synapse usages
(e.g., M11(ij)) as well as two-dimensional representations of
synaptic weights (w(ij) and w.sup.∞(ij)), firing thresholds Ku and
(θ(j) and θ.sup.∞(j)), and dendritic potentials (x(j))
and x'(j)). The two-dimensional variables are required to adequately
represent finite and infinite contributions as described above.

[0100]FIG. 5 illustrates a four-layer system for information retrieval
according to one embodiment of the invention. The system is basically
identical to a system based on inhibitory associative memory (IAM)
proposed in the previous patent application except that the IAM of the
previous invention was replaced by the Bayesian associative memory (BAM)
of the current invention. Here memory address patterns u.sup.μ are
mapped to (carefully chosen) index representations w.sup.μ1 via
an BAM which maps via an error-correcting compressed look-up-table (cLUT)
to the memory content patterns v.sup.μ.

[0101]FIG. 6 shows a block diagram of a system for visual object
recognition using a Bayesian associative memory (BAM) according to one
embodiment of the invention. During learning, images I.sup.μ are
preprocessed in a visual feature hierarchy. The resulting continuous
valued feature vector u is binarized resulting in a binary address vector
u.sup.μ, which is associated with the content or class label
v.sup.μ employing the four-layer-system described in FIG. 5. During
recognition, a test image .sup.μ is processed in a similar way. The
system (with BAM replaced by an IAM) is described in detail in the
previous patent application.

[0102]Further possible applications include efficient implementations of
LVQ (Learning Vector Quantization), in particular, if the pattern vectors
are high-dimensional and moderately sparse and if a very large number of
pattern vectors must be stored.

[0103]Similarly, potential applications include efficient implementations
of clustering algorithms or self-organizing maps if the number of cluster
prototypes is large and the prototype vectors are high-dimensional and
moderately sparse.

[0104]Another potential application is document retrieval: Here the
database may consist of a large set of text documents, for example taken
from the internet. Each text document consists of (possibly many) words
and can be indexed by selecting a relatively small set of key words. The
result is a sparse binary feature vector for each text document. Given an
input query pattern consisting of a set of key words the task is to find
the most relevant documents. This retrieval can be accelerated by the
methods proposed here.

[0105]A complementary idea is to represent the words in a text document by
applying an N-gram code. For example the 1-grams (or monograms) of
"memory" are simply the letters "m", "e", "m", "o", "r", "y". Similarly,
the 2-grams (or digrams) are "me", "em", "mo", "or", "ry", and the
3-grams are "mem", "emo", "mor", "ory". By that a sparse and fault
tolerant code already is obtained very naturally at the word level. For
example, for an alphabet of 26 letters, the 2-gram code represents the
word "memory" by a binary vector of dimension 262=676 where only 5
components are active. This method can be used, for example, to implement
a fault-tolerant code for the keywords described in the previous item.
Additionally, the N-gram method can be used to code keyword order and key
word sequences in a manner suitable for the associative memory models
discussed in this application.

[0106]In summary, the inhibitory neural networks and algorithms proposed
in this application can be used for any application involving the best
match or nearest-neighbor problem if the underlying pattern vectors are
high-dimensional and (moderately) sparse.

[0107]It should be understood that the foregoing relates only to
embodiments of the present invention and that numerous changes and
modifications made therein may be made without departing from the scope
of the invention as set forth in the following claims.