Posts Tagged ‘lstm’

Introduction

There have been a lot of advances in AI in the past couple of years. A lot of these advances are better simulating the various functions of the brain. These include the convolutional neural networks which are very good at image recognition and new techniques to incorporate memory into neural networks.

Very Deep Neural Networks

In the early days of Neural Networks, finding the weights for the connections was very difficult and often performed by hand. Then the gradient descent algorithm came along and allowed bigger Neural Networks to be trained. Then in 1986 a groundbreaking paper by D. E. Rumelhart showed how to use back propagation to train a multi-level Neural Network with Gradient Descent. However the shape of the surface that is being optimized is often very ill suited to this algorithm, containing many local minimums, or more usually being very flat not indicating the direction to take. Plus depending on the problem the training data may contain lots of errors that can mislead the training process.

With recent tweaks to the training algorithms, researchers have managed to train very deep Neural Networks. For instance the Oxford Visual Geometry Group (VGG) has released a pre-trained 19 layer Neural Network for image recognition.

This is a great building block for other image manipulation projects like Image Style Transfer that we looked at previously.

Now these Neural Networks are starting to resemble the architecture and structure of biological Neurons in the human brain such as the following from the human cortex.

This shows that we are starting to accurately simulate the computational engine in our brains.

The Road to Memory

Although the deep neural networks in the last section are very large and powerful at some problems, other problems they fail at primarily due to a lack or memory or context. For instance if you are translating text word by word, you need to remember the previous words in the sentence to get a correct translation based on the context. Or you need to do a first pass word by word and then knowing the whole, correct mistakes based on now knowing more generally what is being said. Similarly as an algorithm deals with the world, it should learn about the world as it explores and gathers more information. Just retraining the whole Neural Network for each bit of new information is very inefficient.

For language translation and speech recognition the use of Recurrent Neural Networks (RNNs) have proven quite effective. In these the outputs from Neurons can feed into the inputs of the same layer or into the inputs of previous layers. The networks of the previous section were all feed forward Neural Networks since the output of a layer only feeds the input of the next layer. RNNs aren’t true non feed forward networks since they don’t iterate to find a solution with everything stabilized, Rather these outputs from use n go into the inputs of usage n+1. In this way these act as a sort of memory from usage to usage allowing the network to preserve some context from say word to word in translation.

These artificial neurons have the ability to store memory values (as well as forget memory values). The key difficulty in adding memory to Neural Networks was in how to train them. Gradient Descent and all its variations require that the function being optimized is differentiable or very nearly so. Putting things in memory, reading memory and erasing memory are very discrete functions. These sort of functions are not differentiable and can’t be patched since they are flat with zero derivative elsewhere. Something with a zero derivative doesn’t give any information to Gradient Descent as to which direction to go. The solution to this was to replace the discrete functions with probability distributions that are differentiable. So rather than say put something in memory, the function gives you a probability that you should put the value in memory and then you do so if say the probability is greater than 50%.

Learning

I think the current tools for training Neural Networks work quite well for deep feedforward Neural Networks. I think they do a good job of training the weights to use in the various network layers. However I don’t think they provide a good solution for training systems with memory. The brain probably uses some process similar to what we do to train the input weights and outputs to biological Neurons, such as Hebbian Learning. However I don’t think this is what is used to decide whether to remember something or not. I think we still have a long way to go before effectively using memory in our Neural Networks even though just a little bit of memory is greatly improving our translators, speech and text recognition programs.

Summary

The field of Neural Networks is making great progress. This is due to advances in refining the training process of deep Neural Networks along with advances in making artificial Neurons more sophisticated by adding elements like memory banks. Combine this with the fast pace of development of GPUs allowing essentially low cost supercomputers for training and running these networks and the large amount of venture capital that is flowing into anything AI related and we are seeing a true renaissance in the AI field.

Does someone have a true deep AI running in their lab already? Perhaps; but, if they don’t I think we are starting to get quite close.