2017

Abstract: Machine learning (ML) representation formats have been dominated by: a) localism, wherein individual items are represented by single units, e.g., Bayes Nets, HMMs; and b) fully distributed representations (FDR), wherein items are represented by unique activation patterns over all the units, e.g., Deep Learning (DL) and its progenitors. DL has had great success vis-a-vis classification accuracy and learning complex mappings (e.g., AlphaGo). But, without massive machine parallelism (MP), e.g., GPUs, TPUs, and thus high power, DL learning is intractably slow. The brain is also massively parallel, but uses only 20 watts and moreover, the forms of MP used in DL, model / data parallelism and shared parameters, are patently non-biological, suggesting DL’s core principles do not emulate biological intelligence. We claim that a basic disconnect between DL/ML and biology and the key to biological intelligence is that instead of FDR or localism, the brain uses sparse distributed representations (SDR), i.e., “cell assemblies”, wherein items are represented by small sets of binary units, which may overlap, and where the pattern of overlaps embeds the similarity/statistical structure (generative model) of the domain. We’ve previously described an SDR-based, extremely efficient, one-shot learning algorithm in which the primary operation is permament storage of experienced events based on single trials (episodic memory), but in which the generative model (semantic memory, classification) emerges automatically, and as a computationally free, in terms of time and power, side effect of the episodic storage process. Here, we discuss fundamental differences between the mainstream localist/FDR-based and our SDR-based approaches.

Abstract: The brain is believed to implement probabilistic reasoning and to represent information via population, or distributed, coding. Most previous population-based probabilistic (PPC) theories share several basic properties: 1) continuous-valued neurons; 2) fully(densely)-distributed codes, i.e., all(most) units participate in every code; 3) graded synapses; 4) rate coding; 5) units have innate unimodal tuning functions (TFs); 6) intrinsically noisy units; and 7) noise/correlation is considered harmful. We present a radically different theory that assumes: 1) binary units; 2) only a small subset of units, i.e., a sparse distributed code (SDC) (cell assembly, ensemble), comprises any individual code; 3) binary synapses; 4) signaling formally requires only single (first) spikes; 5) units initially have completely flat TFs (all weights zero); 6) units are not inherently noisy; but rather 7) noise is a resource generated/used to cause similar inputs to map to similar codes, controlling a tradeoff between storage capacity and embedding the input space statistics in the pattern of intersections over stored codes, indirectly yielding correlation patterns. The theory, Sparsey, was introduced 20 years ago as a canonical cortical circuit/algorithm model, but not elaborated as an alternative to PPC theories. Here, we show that the active SDC simultaneously represents both the most similar/likely input and the coarsely-ranked distribution over all stored inputs (hypotheses). Crucially, Sparsey's code selection algorithm (CSA), used for both learning and inference, achieves this with a single pass over the weights for each successive item of a sequence, thus performing spatiotemporal pattern learning/inference with a number of steps that remains constant as the number of stored items increases. We also discuss our approach as a radically new implementation of graphical probability modeling.

Abstract: Quantum superposition states that any physical system simultaneously exists in all of its possible states, the number of which is exponential in the number of entities composing the system. The strength of presence of each possible state in the superposition—i.e., the probability with which it would be observed if measured—is represented by its probability amplitude coefficient. The assumption that these coefficients must be represented physically disjointly from each other, i.e., localistically, is nearly universal in the quantum theory/computing literature. Alternatively, these coefficients can be represented using sparse distributed representations (SDR), wherein each coefficient is represented by a small subset of an overall population of representational units and the subsets can overlap. Specifically, I consider an SDR model in which the overall population consists of Q clusters, each having K binary units, so that each coefficient is represented by a set of Q units, one per cluster. Thus, K^Q coefficients can be represented with KQ units. We can then consider the particular world state, X, whose coefficient’s representation, R(X), is the set of Q units active at time t to have the maximal probability and the probabilities of all other states, Y, to correspond to the size of the intersection of R(Y) and R(X). Thus, R(X) simultaneously serves both as the representation of the particular state, X, and as a probability distribution over all states. Thus, set intersection may be used to classically implement quantum superposition. If algorithms exist for which the time it takes to store (learn) new representations and to find the closest-matching stored representation (probabilistic inference) remains constant as additional representations are stored, this would meet the criterion of quantum computing. Such algorithms, based on SDR, have already been described. They achieve this "quantum speed-up" with no new esoteric technology, and in fact, on a single-processor, classical (Von Neumann) computer.

Abstract: We present a radically new model of chunking, the process by which a monolithic representation emerges for a sequence of items, called overcoding-and-pruning (OP). Its core insight is this: if a sizable population of neurons is assigned to represent an ensuing sequence immediately, at sequence start, it can then be repeatedly pruned as functions of each successive item. This solves the problem of assigning unique chunk representations to sequences that start in the same way, e.g., "CAT" and "CAR", without requiring temporary buffering of the items' representations. OP rests on two well-supported assumptions: 1) information is represented in cortex by sparse distributed representations; and 2) neurons at progressively higher cortical stages have progressively longer activation duration-or, persistence. We believe that this type of mechanism has been missed so far due to the historical bias of thinking in terms of localist representations, which cannot support it since pruning cannot be applied to a single representational unit.

A Cortex-inspired Associative Memory with O(1) Learning, Recall, and Recognition of Sequences. (manuscript in prep)

2007

A Functional Role for the Minicolumn in Cortical Population Coding. Invited Talk at Cortical Modularity and Autism, University of Louisville, Louisville, KY, Oct 12-14, 2007. A revised/corrected version of the talk (PPT) ( pdf Animations do not show in pdf version)

2006

Hierarchical Sparse Distributed Representations of Sequence Recall and Recognition. Presentation given at The Redwood Center for Theoretical Neuroscience (University of California, Berkeley) on Feb 22, 2006: powerpoint (~6 meg). (video of talk) (Note: the ppt presentation uses a lot of animations so you probably need a very up-to-date version of ppt to view it correctly.)