Background

An SDR is a binary vector, where only a small portion of the bits are ‘on’. There is growing evidence that SDRs are a feature of biological computation, for storing and conveying information. In a biological context, this represents a small number of active cells in a population of cells. SDR’s have been adopted by HTM (in CLA), and HTM's focus on this has set it apart from the bulk of mainstream machine learning research.

SDRs have very promising characteristics and are still relatively under utilised in the field of AI and machine learning. It’s an exciting area to be watching as AGI leaps forward.

The concept of SDRs are not new however. Kanerva first proposed a sparse distributed memory as a physiologically plausible model for human memory in his influential 1988 book Sparse Distributed Memory [Kanerva1988]. The properties of sparse distributed memory were nicely summarised by Denning in 1989 [Denning1989].

As early as 1997, Hinton and Ghahramani described a generative model, implementable with a neural network, that ‘discovered’ sparse distributed representations for image analysis [Hinton1997].

Then in 1998 SDRs were used in the context of robot control for navigation by both Rajesh et. al and Rao et. al [Rajesh1998, Rao1998] and then in 2004 for reinforcement learning by Ratitch [Ratitch2004].

Recently some great new resources have become available. There is a new video from Numenta of Subutai Ahmad presenting on the topic. A nice companion to that is an older introductory video of Jeff Hawkins presenting. The recent draft paper by Ferrier on a universal cortical algorithm (discussed in an earlier blog post) gives an excellent summary of their characteristics.

What’s all the fuss about? - SDR Characteristics

Given that so much great (and recent) material exists describing SDRs, I won’t go into very much detail. This post would not be complete though, without at least a cursory look at the ‘highlights’ of SDRs.

Semantic

each bit corresponds to something meaningful

Efficient/Versatile

storage: there is a huge number of potential encodings for a given vector size, as capacity increases exponentially with number of potentially active bits

compositionality: because of sparsity SDRs are generally linearly separable and can therefore be combined to represent a set of several states

comparisons: it is very efficient to measure similarity between vectors (you just need to count overlap of ‘on’ bits) and if a state is part of a set (due to compositionality)

Robust

subsampled or noisy vectors are still semantically similar and can be compared effectively

Conclusion

We believe that these elements make SDRs an excellent choice as the main data structure of any AGI implementation - they are used in our current approach. The highlights and links given above should be a good start to anyone that wants to learn more.

Monday, 22 December 2014

This post asks some questions about the agency of hierarchical action selection. We assume various pieces of HTM / MPF canon, such as a cortical hierarchy.

Agency

The concept of agency has various meanings in psychology, neuroscience, artificial intelligence and philosophy. The common element is having control over a system, with varying qualifiers regarding the entities who may be aware of execution or availability of control. Although "agency" has several definitions, let's use this one I made up:

An agent has agency over a state S, if its actions affect the probability that S occurs.

Hierarchical Selection thought experiment

Now let's consider a hierarchical representation of action-states (actions and states encoded together). Candidate actions can therefore be synonymous with predictions of future states. Let's assume that actions-states can be selected as objectives anywhere in the hierarchy. More complex actions are represented as combinations or sequences of simpler action-states defined in lower levels of the hierarchy.

Let's say an "abstract" action-state at a high level in the hierarchy is selected. How is the action-state executed? In other words, how is the abstract state made to occur?

To exploit the structure of the hierarchy, let's assume each vertex of the hierarchy re-interprets selected actions. This translates a compound action into its constituent parts.

How much control does higher-level selection exert over lower-level execution? For simplicity let's assume there are two alternatives:

We exclude the possibility that higher levels directly control or subsume all lower levels due to the difficulty and complexity of performing such a task without the benefit of hierarchical problem decomposition.

If high levels do not exert strong control over lower levels, the probability of faithfully executing an abstract plan should be small due to compound uncertainty at each level. For example, let's say the probability of each hierarchy level correctly interpreting a selected action is x. The height of the hierarchy h determines the number of interpretations between selection of the abstract action and execution of relevant concrete actions. The probability of an abstract action a being correctly executed is:

P(a) = xh

So for example, if h=10 and x=0.9, P(a) = 0.34.

We can see that in a hierarchy with a very large number of levels, the probability of executing any top-level strategy will be very small unless each level interprets higher-level objectives faithfully. However, "weak control" may suffice in a very shallow hierarchy.

Are abstract actions easy to execute?

Introspectively I observe that highly abstract plans are frequently and faithfully executed without difficulty (e.g. it is easy to drive a car to the shops for groceries, something I consider a fairly abstract plan). Given the apparent ease with which I select and execute tasks with rewards delayed by hours, days or months, it seems I have good agency over abstract tasks.

According to the thought experiment above, my cortical hierarchy must either be very shallow or higher levels must exert "strong control" over lower levels.

Let's assume the hierarchy is not shallow (it might be, but then that's a useful conclusion in its own right).

Local Optimisation

Local processes may have greater biological validity because they imply less difficulty/specificity routing relevant signals to the right places. Hopefully the amount of wiring is reduced also.

What would a local implementation of a strong control architecture look like? Each vertex of the hierarchy would receive some objective action-state[s] as input. (When no input is received, no output is produced). Each vertex would produce some objective action-states as output, in terms of action-states in the level below. The hierarchical encoding of the world would be undone incrementally by each level.

At the lowest level the output action-states would be actual motor control signals.

A cascade of incremental re-interpretation would flow from the level of original selection down to levels that process raw data (either as input or output). In each case, local interpretation should only be concerned with maximizing the conditional probability of the selected action-state given the current action-state and instructions passed to the level immediately below.

Clearly, the agency of each hierarchy vertex over its output action-states is crucial. The agency of hierarchy levels greater than 0 is dependent on faithful interpretation by lower levels. Other considerations (such as reward associated with output action-states) must be ignored, else the agency of higher hierarchy levels is lost.

Cortex Layer 6

Connections form between layer 6 of cortex in "higher" regions and layer 6 in "lower" regions, with information travelling from higher (more abstract) regions towards more concrete regions (i.e. a feedback direction). Layer 6 neurons also receive input from other cortex layers in the same region. However, note that the referenced work disputes the validity of assigning directionality, such as "feed-back", to cortical layers.

Pressing on regardless, given some assumptions about cortical hierarchy, we can speculatively wonder whether the layer 6 neurons embody a local optimization process that incrementally translates selected actions into simpler parts, using information from other cortex layers for context. The purpose of cortex layer 6 remains mysterious.

Another difficulty for this theory is that cortex layer 5 seems to be more complex than simply the output from layer 6. Activity in layer 5 seems to be the result of interaction between cortex and Thalamus. Potentially this interaction could be usefully overriding layer 6 instructions to produce novel action combinations.

There is some evidence that dopaminergic neurones in the Striatum are involved in agency learning, but this doesn't necessarily refute this post, because this process may modulate cortical activity via the Thalamus. Cortex layer 6 may still require some form of optimization to ensure that higher hierarchy levels have agency over future action-states.