Archives

Categories

Meta

Author: CireNeikual

It’s been a while! I have been working on AI related stuff of course, but what exactly I have been spending the bulk of my time on will be revealed in the near future.

For now though, I would like to show a simple little demo I made for showing the generative characteristics of SDRs (Sparse Distributed Representations).

In terms of generative models, Generative Adverserial Networks (GANs) and Variational Autoencoders (VAEs) seem to currently be among the most popular models.

I have come up with another way of doing a generative model relatively easily. It’s based on K-sparse autoencoders (here). SDRs are implicitly generative, as they force a certain (binary) distribution on the hidden units. K-sparse autoencoders can be used to learn SDRs with some modifications: The hidden units should be binary (top K are set to 1, the rest to 0), and training proceeds by minimizing the reconstruction error with tied weights.

With these modifications, one can learn some nice features from data. It is also possible to be able to control certain features by forcing a correlation between a feature and an input pattern. I did this by having two networks: The K-sparse autoencoder network (with binary states + reconstruction), and a random projection network that maps “control” features to random biases in the hidden units of the autoencoder.

The resulting system learns to incorporate the controlled features into the hidden features such that the reconstruction from a biased set of features produces the image with the desired properties.

So let’s try the standard MNIST test. The control features are a 1-hot vector of length 10 (one for each digit), while the hidden size is 256 units. The random projections were initialized to relatively high weights to overcome the K-sparsity’s thresholding. After training for a few seconds, we can see the results:

I applied a thresholding shader to make it look a bit prettier, although it does cut off some transitions a little.

I have recently made a small Python port of my GPU NeoRL library. It doesn’t have the same features, so the most important differences are listed here:

It is fully connected (not sparsely connected)

It uses a new method for organizing temporal data (predictive coding)

It is slower

It is much easier to understand!

I call this port MiniNeoRL, since it serves mostly to help me prototype new algorithms and to explain the algorithms to others. Now I am not exactly a Python expert, but I think the code is simple enough such that with some explanation it should be easy to understand.

Along with MiniNeoRL I have made this slideshow that serves as a brief overview of what NeoRL is and how it works:

In my ongoing quest of improving NeoRL to produce a generic intelligent reinforcement learning agent, I created a demo I would like to share. It’s a simple demo, but still interesting in my opinion.

The agent receives 1D vision data (since the game is 2D), and must drive a car around a thin track. This is essentially a “thread the needle” task, where the AI requires relatively precise control in order to obtain reward.

As a human, I was not able to make it as far as the AI did. The AI almost completed the entire track, I was only able to make it about half-way. It looks easy, but it’s not!

A while back I showed how I was able to memorize music and play it back using HTSL. Now with NeoRL I can not only remember but also generate more music based on sample data.

As is usually done with these predictive-generative scenarios, I added some noise to the input as it runs off of its own predictions. This causes it to diverge from the original data somewhat, resulting in semi-original audio.

Here is some audio data from a song called “Glorious Morning” by Waterflame:

Now a problem with this is that it is just being trained off of one song right now, so the result is basically just a reorganized form of the original plus noise. I am going to try to train it on multiple songs, extract end-of-sequence SDRs, and use these to generate songs with a particular desired style based on the input data styles. Longer training times should help clear up the noise a bit too (hopefully).

Full source code is available in the NeoRL repository. It is the Audio_Generate.cpp example. Link to repository here.

A while ago, I showed an MNIST prediction demo. Many rightfully thought that it may just be learning an identity mapping. But, with some slight modifications I can show that the algorithm does indeed predict properly and does so fully online.

I changed the SDRs to binary, this way there is no decay/explosion when continuously feeding its own predictions as input. So I can now run NeoRL’s predictive hierarchy (without RL) on itself indefinitely. It simplifies the digits to noisy blobs (since the digits are chosen randomly, it can’t predict uniform randomness), but the movement trajectories are preserved.

Another interesting thing is how fast it learns this – I trained it for about 1 minute to get the video below. It also ran in real-time while training (I didn’t write a “speed mode” in the demo yet).

The binary SDRs do have some downsides though. While the indefinite predictions thing is interesting, the binary SDRs sacrifice some representational power by removing the ability to have scalar SDRs.

So here’s a video. The first half shows it just predicting the next frame based on the input on the left. Then in the second half, the input on the left is ignored (it is not fed into the agent at all), rather the agent’s own predictions are used as input. As a result, it plays a sort of video of its own knowledge of the input.

I continue to work on NeoRL, and have a new way of applying reinforcement learning to the underlying predictive hierarchy. So far it works better than previous algorithms, despite not being debugged and optimized yet.

The new reinforcement learning algorithm is based on the deterministic policy gradient version (action gradient) of my SDRRL algorithm (SDRRL 2.0). Recall that a single SDRRL agent has an architecture like this (see here for the original post: link):

It was able to solve a large variety of simple tasks very quickly while using next to no processing power due to its sparsity. But, it had problems: It didn’t scale well, since it didn’t have a hierarchy. I now came up with an efficient way of doing hierarchy with this system.

Consider now a layer of SDRRL units, with sparse, local connectivity. It uses multiple Q nodes for different portions of the layer (they are also convolutional). The architecture looks like this:

There can be as many action layers as desired. In my model, I use one action layer for the output actions and one for attention.

The input comes from the layer below or the input layer to the system. In my implementation it is 2D so it can work easily on images and run well on the GPU. The hidden layer performs prediction-assisted sparse coding, as to form a predictive hierarchy. Once the sparse codes are found, we activate sub-networks with the action layers as input through the “on” bits of the sparse codes. This is basically a convolutional form of the SDRRL 2.0 algorithm. Actions are then created by starting from the predicted action and then moving along the deterministic policy gradient.

As always, features are extracted upwards, and actions flow downwards. Now, actions are integrated into the lower layers as another set of sparse codes in the SDRRL hidden layer. So the full state of the hidden layer in SDRRL contains the feed-forward features and the feed-back action codes.

As explained earlier, I use two layers of actions. One for the action to be taken (output), and another for attention. Attention works by blocking off regions of the input as to ignore it. Which regions should be blocked are learned through the deterministic policy gradients.

I just finished coding this thing, and got excited when I saw it working without any tuning at all, and while likely still having many bugs. So I decided to make a video of it moving to the right (not shown, but it still works when I tell it to reverse directions):

I would like to give an explanation of the algorithm behind NeoRL (GPU) and NeoRL-CPU (Available here and here). In this post, I will only go over the predictive hierarchy, since the reinforcement learning is still work-in-progress.

Overview

So let’s start with the basic idea behind the algorithm.

NeoRL operates on the theory that the neocortex is a bidirectional predictive hierarchy, similar to HTM (Hierarchical Temporal Memory). However, it differentiates from HTM in several important aspects:

Includes temporal pooling

Full hierarchy support of multiple regions

Does not use columns for predictions, these are reserved for reinforcement learning

Uses spiking neurons with “explaining-away” characteristics

Continuous inputs and predictions

The connectivity scheme of NeoRL is sort of like that of a convolutional neural network, but the weights are not shared. Also, as mentioned before, it is a bidirectional hierarchy unlike convolutional neural networks. The basic idea is that features are extracted upwards and predictions come downwards. The predictions can then influence the feature extraction to improve the features, which in turn results in better predictions.

NeoRL uses this hierarchy to predict the next timestep. The input is a 2D field (although theoretically it can be any dimension) of scalars, and the predictions are another 2D field of the same dimensions as the input. This makes it sort of like a predictive autoencoder.

Why predict only one timestep? Well, for one, it’s the theoretical minimum for building a world model. Why is this? Well, if your model understands how to predict the next timestep, then it can predict the timestep after that based on the timestep it just predicted, and then predict from that, and so on.

Now, let’s go into how the spatio-temporal features are extracted (upwards flow).

Spatio-Temporal Feature Extraction

Spatial-temporal feature extraction is based on sparse coding, or SDRs (sparse distributed representations). With SDRs/sparse coding, one attempts to find a sparse (few active) set of bases that can be used to reconstruct the input. This is not unlike a sparse autoencoder, however the way NeoRL does it is a bit different.

Below is an image of the FISTA algorithm being used to extract sparse codes.

The algorithm used in NeoRL is similar to ISTA, but uses spiking neurons to produces time-averaged codes that are always in the [0, 1] range. I found that ISTA seems to not have strict enough bounding for codes, so I opted for something between ISTA and another algorithm called SAILnet (here).

The result works as follows:

Excite neurons from reconstruction error

Inhibit neurons from each other (lateral connectivity)

Reconstruct the average firing rates of the neurons to get a reconstruction error (input – reconstruction)

Repeat 1-3

Like ISTA, neurons are activated off of reconstruction error, but like SAILnet the codes are formed by having a spiked neuron inhibit its neighbors. This is performed sequentially for some iterations until a stable sparse code has been formed.

Once the code has been formed, the feed-forward and lateral weights are updated through Hebbian and anti-Hebbian learning rules respectively, based on the average spiking activities and the reconstruction thereof.

In order to extend this idea to the time domain, one can add an extra set of recurrent connections. This takes the previous average spiking activities and feeds them back in to the current cycle. So it will try to form sparse codes not only of the input, but also itself. This leads to a history compression algorithm.

However, there is one more trick we can do: In order to make the representation as efficient as possible, we can only compress the history/spatial features that lead to low prediction errors. This is accomplished through eligibility traces.

Eligibility traces are often used in reinforcement learning to propagate reward signals back in time to address the distal reward problem. In our case, we are using them as a replacement for backpropagation through time (BPTT), which is typically used with the LSTM algorithm. Instead of having to save a history buffer and update on that to a fixed-length horizon, we can easily propagate prediction errors to past codes with an infinite horizon (well, limited by floating-point accuracy of course).

The idea is that instead of updating the weights for the sparse coder directly, we instead use the weight change we would have applied to increment the eligibility trace. This trace then decays exponentially, giving newer samples more importance. Then, when the prediction error is below average, we want to update on those traces (since the past updates were good for us).

Here is an example of how such a trace variable can look (plotted over time). It’s not shown in the image, but the trace can also be negative.

So that’s how a single layer performs feature extraction.

Prediction

The prediction in NeoRL is very simple. It’s essentially a multilayer perceptron with thresholded units that points in the opposite direction of the feature extraction hierarchy. It can be thought of as overlaying the feature extraction hierarchy – each prediction neuron tries to learn a mapping from local (lateral and feedback) inputs to the next state of the feature extractor neuron it is associated with.

Each layer receives local (lateral) features along with the predictions of the next higher layer (if there is one) as input. It is then trained with a standard time-delayed perceptron learning rule to produce the next SDR at each layer.

The prediction errors are kept around to feed back to the feature extractors as explained earlier. We keep a decaying average of the error, and if the current error is less than the average, we “reward” the feature extractor here, otherwise we do nothing.

Conclusion

NeoRL is still in early stages of development, as is NeoRL-CPU. I hope I was able to give a decent explanation of what is going on in the algorithm. I will post explanations of the reinforcement learning extension as soon as it is up and running as well. Feel free to try out the library yourself!

I now have a LSTM/ConvNet competitive version of my HTM-inspired algorithm. It now lives as part of a new library I am working on called Neo/RL, (neocortex + reinforcement learning). The reinforcement learning is not yet complete, but the predictive hierarchy is up and running smoothly, and it is able to match or even outperform LSTMs/ConvNets, all while being fully online without any sort of rehearsal or batching. Don’t believe me? Evaluate it for yourself! https://github.com/222464/NeoRL

I am working on additional benchmarks, here is one where I reproduced this paper which used LSTMs to predict moving digits (link to original paper: http://arxiv.org/pdf/1502.04681.pdf)

I alternate between predicting the next input based on current input and predicting the next input based on its own predictions in the video. The video is real-time.

I am also working on a text prediction benchmark, and a time series dataset benchmark (the latter of which already works, but I need the LSTM comparison working properly as well).

To truly convince people, I probably need more benchmarks still though, three is not enough! So, if you are interested in helping out on that front, let me know!

As always, I am trying to figure out how the Neocortex works in order to exploit its properties for reinforcement learning.

I believe I have finally found something substantial, at least in how complete it is. My latest model has several bizarre features. Whether or not the whole thing actually works remains to be seen as I am still in the process of coding it. This model is built of components that I have already shown work, so the question is whether the combination of these components leads to desired properties.

The fundamental idea behind this theory I came up with as a submission to Numenta’s HTM challenge. It is as follows: Every cortical column is in itself a tiny reinforcement learning agent that learns to control information flow in order to maximize a reward signal.

There are three important modules to this new system:

A bottom-up sparse coding hierarchy

A top-down prediction hierarchy

The gating SDRRL units (reinforcement learners)

So, I decided to use my previous SDRRL algorithm for this task, but really any reinforcement learning agent should work.

Sparse codes are extracted in a bottom up fashion. However, unlike typical hierarchical sparse coding, the inputs from one layer to the next are modulated by the SDRRL units – this way, the column can learn to drive attention to certain inputs. Each SDRRL unit itself receives sparse codes in a local radius as input, and along with this attention gate, it has a prediction learning gate and a sparse code learning gate. This makes 3 gates in total, although the exact amount may change as I develop this theory further.

The top-down predictive hierarchy learns to predict the sparse codes of the next timestep, but its learning rate is modulated by SDRRL. This way, SDRRL can choose to only predict things that lead to higher rewards in the future – considering that some of the predicted inputs may actually be actions as well, this allows the system to select actions.

The system as a whole gates information flow in a “reinforcement-learning-modulated” fashion, so instead of the purely unsupervised learning typical associated with hierarchical sparse coding/prediction, it “bends” the process towards important information and rewarding prediction-actions.

Below is a diagram of a single column of this model:

Well, on to coding the thing! I am developing a CPU version first, and then a multithreaded/GPU version in OpenCL.

In my previous post I presented SDRRL, and in the one before that a demo of that algorithm. Since then, I have made many improvements to the algorithm, vastly increasing performance, both in terms of convergence rate and processing power used. I have another demo, but this time it is not a web demo, since it is something I used for internal testing that I just cleaned up a bit 🙂

SDRRL v2.0 Demo

I present to you a simple “Big Dog” style demo, where SDRRL must learn to move a robotic dog body to the right. Almost all of the processing time spent is taken up by the physics engine instead of the AI.

When running the demo, press T to speed up time, and K to reverse the walking direction.

So, the first feature I harp on a lot – something backpropagation-based solutions lack, and that is fully online learning without experience replay or stochastic sampling (which have horrible computational complexity).

The second feature is there because this is based off of my PRSDR algorithm, which is basically a hierarchical LSTM replacement (for those interested, I have some performance benchmarks, showing the up sides and down sides). It’s the usual HTM-like bidirectional predictive hierarchy thing.

Actions are selected by perturbing the predictions towards actions that lead to higher reward. Right now I am using a simple policy gradient method to do this.

Now, the last two points are sort of the same thing: This model has imagination. I’m serious! The basic idea is as follows: Leak some of your own predictions into your input. This way, the model tries not only to run and predict off of the world, but also itself. It tries to predict its own predictions – lead to a sort of sensory-implanting imagination similar to that that humans have. Sure, this imagination isn’t really necessary for AGI, but it’s a good heuristic to speed up learning I think. It allows for simulation of situations ahead of time, and planning as a result.

Other than that the model uses good ol’ SARSA for reward prediction and temporal difference error generation.

I am working on some demos for it now, and am trying to train it on the ALE. Let’s see how that goes!