Archives

Categories

Meta

Month: November 2015

I would like to give an explanation of the algorithm behind NeoRL (GPU) and NeoRL-CPU (Available here and here). In this post, I will only go over the predictive hierarchy, since the reinforcement learning is still work-in-progress.

Overview

So let’s start with the basic idea behind the algorithm.

NeoRL operates on the theory that the neocortex is a bidirectional predictive hierarchy, similar to HTM (Hierarchical Temporal Memory). However, it differentiates from HTM in several important aspects:

Includes temporal pooling

Full hierarchy support of multiple regions

Does not use columns for predictions, these are reserved for reinforcement learning

Uses spiking neurons with “explaining-away” characteristics

Continuous inputs and predictions

The connectivity scheme of NeoRL is sort of like that of a convolutional neural network, but the weights are not shared. Also, as mentioned before, it is a bidirectional hierarchy unlike convolutional neural networks. The basic idea is that features are extracted upwards and predictions come downwards. The predictions can then influence the feature extraction to improve the features, which in turn results in better predictions.

NeoRL uses this hierarchy to predict the next timestep. The input is a 2D field (although theoretically it can be any dimension) of scalars, and the predictions are another 2D field of the same dimensions as the input. This makes it sort of like a predictive autoencoder.

Why predict only one timestep? Well, for one, it’s the theoretical minimum for building a world model. Why is this? Well, if your model understands how to predict the next timestep, then it can predict the timestep after that based on the timestep it just predicted, and then predict from that, and so on.

Now, let’s go into how the spatio-temporal features are extracted (upwards flow).

Spatio-Temporal Feature Extraction

Spatial-temporal feature extraction is based on sparse coding, or SDRs (sparse distributed representations). With SDRs/sparse coding, one attempts to find a sparse (few active) set of bases that can be used to reconstruct the input. This is not unlike a sparse autoencoder, however the way NeoRL does it is a bit different.

Below is an image of the FISTA algorithm being used to extract sparse codes.

The algorithm used in NeoRL is similar to ISTA, but uses spiking neurons to produces time-averaged codes that are always in the [0, 1] range. I found that ISTA seems to not have strict enough bounding for codes, so I opted for something between ISTA and another algorithm called SAILnet (here).

The result works as follows:

Excite neurons from reconstruction error

Inhibit neurons from each other (lateral connectivity)

Reconstruct the average firing rates of the neurons to get a reconstruction error (input – reconstruction)

Repeat 1-3

Like ISTA, neurons are activated off of reconstruction error, but like SAILnet the codes are formed by having a spiked neuron inhibit its neighbors. This is performed sequentially for some iterations until a stable sparse code has been formed.

Once the code has been formed, the feed-forward and lateral weights are updated through Hebbian and anti-Hebbian learning rules respectively, based on the average spiking activities and the reconstruction thereof.

In order to extend this idea to the time domain, one can add an extra set of recurrent connections. This takes the previous average spiking activities and feeds them back in to the current cycle. So it will try to form sparse codes not only of the input, but also itself. This leads to a history compression algorithm.

However, there is one more trick we can do: In order to make the representation as efficient as possible, we can only compress the history/spatial features that lead to low prediction errors. This is accomplished through eligibility traces.

Eligibility traces are often used in reinforcement learning to propagate reward signals back in time to address the distal reward problem. In our case, we are using them as a replacement for backpropagation through time (BPTT), which is typically used with the LSTM algorithm. Instead of having to save a history buffer and update on that to a fixed-length horizon, we can easily propagate prediction errors to past codes with an infinite horizon (well, limited by floating-point accuracy of course).

The idea is that instead of updating the weights for the sparse coder directly, we instead use the weight change we would have applied to increment the eligibility trace. This trace then decays exponentially, giving newer samples more importance. Then, when the prediction error is below average, we want to update on those traces (since the past updates were good for us).

Here is an example of how such a trace variable can look (plotted over time). It’s not shown in the image, but the trace can also be negative.

So that’s how a single layer performs feature extraction.

Prediction

The prediction in NeoRL is very simple. It’s essentially a multilayer perceptron with thresholded units that points in the opposite direction of the feature extraction hierarchy. It can be thought of as overlaying the feature extraction hierarchy – each prediction neuron tries to learn a mapping from local (lateral and feedback) inputs to the next state of the feature extractor neuron it is associated with.

Each layer receives local (lateral) features along with the predictions of the next higher layer (if there is one) as input. It is then trained with a standard time-delayed perceptron learning rule to produce the next SDR at each layer.

The prediction errors are kept around to feed back to the feature extractors as explained earlier. We keep a decaying average of the error, and if the current error is less than the average, we “reward” the feature extractor here, otherwise we do nothing.

Conclusion

NeoRL is still in early stages of development, as is NeoRL-CPU. I hope I was able to give a decent explanation of what is going on in the algorithm. I will post explanations of the reinforcement learning extension as soon as it is up and running as well. Feel free to try out the library yourself!

I now have a LSTM/ConvNet competitive version of my HTM-inspired algorithm. It now lives as part of a new library I am working on called Neo/RL, (neocortex + reinforcement learning). The reinforcement learning is not yet complete, but the predictive hierarchy is up and running smoothly, and it is able to match or even outperform LSTMs/ConvNets, all while being fully online without any sort of rehearsal or batching. Don’t believe me? Evaluate it for yourself! https://github.com/222464/NeoRL

I am working on additional benchmarks, here is one where I reproduced this paper which used LSTMs to predict moving digits (link to original paper: http://arxiv.org/pdf/1502.04681.pdf)

I alternate between predicting the next input based on current input and predicting the next input based on its own predictions in the video. The video is real-time.

I am also working on a text prediction benchmark, and a time series dataset benchmark (the latter of which already works, but I need the LSTM comparison working properly as well).

To truly convince people, I probably need more benchmarks still though, three is not enough! So, if you are interested in helping out on that front, let me know!

As always, I am trying to figure out how the Neocortex works in order to exploit its properties for reinforcement learning.

I believe I have finally found something substantial, at least in how complete it is. My latest model has several bizarre features. Whether or not the whole thing actually works remains to be seen as I am still in the process of coding it. This model is built of components that I have already shown work, so the question is whether the combination of these components leads to desired properties.

The fundamental idea behind this theory I came up with as a submission to Numenta’s HTM challenge. It is as follows: Every cortical column is in itself a tiny reinforcement learning agent that learns to control information flow in order to maximize a reward signal.

There are three important modules to this new system:

A bottom-up sparse coding hierarchy

A top-down prediction hierarchy

The gating SDRRL units (reinforcement learners)

So, I decided to use my previous SDRRL algorithm for this task, but really any reinforcement learning agent should work.

Sparse codes are extracted in a bottom up fashion. However, unlike typical hierarchical sparse coding, the inputs from one layer to the next are modulated by the SDRRL units – this way, the column can learn to drive attention to certain inputs. Each SDRRL unit itself receives sparse codes in a local radius as input, and along with this attention gate, it has a prediction learning gate and a sparse code learning gate. This makes 3 gates in total, although the exact amount may change as I develop this theory further.

The top-down predictive hierarchy learns to predict the sparse codes of the next timestep, but its learning rate is modulated by SDRRL. This way, SDRRL can choose to only predict things that lead to higher rewards in the future – considering that some of the predicted inputs may actually be actions as well, this allows the system to select actions.

The system as a whole gates information flow in a “reinforcement-learning-modulated” fashion, so instead of the purely unsupervised learning typical associated with hierarchical sparse coding/prediction, it “bends” the process towards important information and rewarding prediction-actions.

Below is a diagram of a single column of this model:

Well, on to coding the thing! I am developing a CPU version first, and then a multithreaded/GPU version in OpenCL.