Categories

Meta

Month: December 2015

A while back I showed how I was able to memorize music and play it back using HTSL. Now with NeoRL I can not only remember but also generate more music based on sample data.

As is usually done with these predictive-generative scenarios, I added some noise to the input as it runs off of its own predictions. This causes it to diverge from the original data somewhat, resulting in semi-original audio.

Here is some audio data from a song called “Glorious Morning” by Waterflame:

Now a problem with this is that it is just being trained off of one song right now, so the result is basically just a reorganized form of the original plus noise. I am going to try to train it on multiple songs, extract end-of-sequence SDRs, and use these to generate songs with a particular desired style based on the input data styles. Longer training times should help clear up the noise a bit too (hopefully).

Full source code is available in the NeoRL repository. It is the Audio_Generate.cpp example. Link to repository here.

A while ago, I showed an MNIST prediction demo. Many rightfully thought that it may just be learning an identity mapping. But, with some slight modifications I can show that the algorithm does indeed predict properly and does so fully online.

I changed the SDRs to binary, this way there is no decay/explosion when continuously feeding its own predictions as input. So I can now run NeoRL’s predictive hierarchy (without RL) on itself indefinitely. It simplifies the digits to noisy blobs (since the digits are chosen randomly, it can’t predict uniform randomness), but the movement trajectories are preserved.

Another interesting thing is how fast it learns this – I trained it for about 1 minute to get the video below. It also ran in real-time while training (I didn’t write a “speed mode” in the demo yet).

The binary SDRs do have some downsides though. While the indefinite predictions thing is interesting, the binary SDRs sacrifice some representational power by removing the ability to have scalar SDRs.

So here’s a video. The first half shows it just predicting the next frame based on the input on the left. Then in the second half, the input on the left is ignored (it is not fed into the agent at all), rather the agent’s own predictions are used as input. As a result, it plays a sort of video of its own knowledge of the input.

I continue to work on NeoRL, and have a new way of applying reinforcement learning to the underlying predictive hierarchy. So far it works better than previous algorithms, despite not being debugged and optimized yet.

The new reinforcement learning algorithm is based on the deterministic policy gradient version (action gradient) of my SDRRL algorithm (SDRRL 2.0). Recall that a single SDRRL agent has an architecture like this (see here for the original post: link):

It was able to solve a large variety of simple tasks very quickly while using next to no processing power due to its sparsity. But, it had problems: It didn’t scale well, since it didn’t have a hierarchy. I now came up with an efficient way of doing hierarchy with this system.

Consider now a layer of SDRRL units, with sparse, local connectivity. It uses multiple Q nodes for different portions of the layer (they are also convolutional). The architecture looks like this:

There can be as many action layers as desired. In my model, I use one action layer for the output actions and one for attention.

The input comes from the layer below or the input layer to the system. In my implementation it is 2D so it can work easily on images and run well on the GPU. The hidden layer performs prediction-assisted sparse coding, as to form a predictive hierarchy. Once the sparse codes are found, we activate sub-networks with the action layers as input through the “on” bits of the sparse codes. This is basically a convolutional form of the SDRRL 2.0 algorithm. Actions are then created by starting from the predicted action and then moving along the deterministic policy gradient.

As always, features are extracted upwards, and actions flow downwards. Now, actions are integrated into the lower layers as another set of sparse codes in the SDRRL hidden layer. So the full state of the hidden layer in SDRRL contains the feed-forward features and the feed-back action codes.

As explained earlier, I use two layers of actions. One for the action to be taken (output), and another for attention. Attention works by blocking off regions of the input as to ignore it. Which regions should be blocked are learned through the deterministic policy gradients.

I just finished coding this thing, and got excited when I saw it working without any tuning at all, and while likely still having many bugs. So I decided to make a video of it moving to the right (not shown, but it still works when I tell it to reverse directions):