Deep Learning for NLP: ANNs, RNNs and LSTMs explained!

Well, these weights are also included in any edge that joins two different neurons.

This means that in the image of a larger neural network, they are present in every single one of the black edges, taking the output of one neuron, multiplying it and then giving it as input to the other neuron that such edge is connected to.

Image of a neural network with two hidden layers and the weights in between each of the layersWhen we train a neural network (training a neural network is the ML expression for making it learn) we feed it a set of known data (in ML this is called labelled data), have it predict a characteristic that we know about such data (like if an image represents a dog or a cat) and then compare the predicted result to the actual result.

As this process goes on and the network makes mistakes, it adapts the weights of the connections in between the neurons to reduce the number of mistakes it makes.

Because of this, as shown before, if we give the network more and more data most of the time it will improve it’s performance.

Learning from sequential data — Recurrent Neural NetworksNow that we know what artificial neural networks and deep learning are, and have a slight idea of how neural networks learn, lets start looking at the type of networks that we will use to build our chatbot: Recurrent Neural Networks or RNNs for short.

Recurrent neural networks are a special kind of neural networks that are designed to effectively deal with sequential data.

This kind of data includes time series (a list of values of some parameters over a certain period of time) text documents, which can be seen as a sequence of words, or audio, which can be seen as a sequence of sound frequencies.

The way RNNs do this, is by taking the output of each neuron, and feeding it back to it as an input.

By doing this, it does not only receive new pieces of information in every time step, but it also adds to these new pieces of information a weighted version of the previous output.

This makes these neurons have a kind of “memory” of the previous inputs it has had, as they are somehow quantified by the output being fed back to the neuron.

A recurrent neuron, where the output data is multiplied by a weight and fed back into the inputCells that are a function of inputs from previous time steps are also known as memory cells.

The problem with RNNs is that as time passes by and they get fed more and more new data, they start to “forget” about the previous data they have seen, as it gets diluted between the new data, the transformation from activation function, and the weight multiplication.

This means they have a good short term memory, but a slight problem when trying to remember things that have happened a while ago (data they have seen many time steps in the past).

We need some sort of Long term memory, which is just what LSTMs provide.

Enhancing our memory — Long Short Term Memory NetworksLong-Short Term Memory networks or LSTMs are a variant of RNN that solve the Long term memory problem of the former.

We will end this post by briefly explaining how they work.

They have a more complex cell structure than a normal recurrent neuron, that allows them to better regulate how to learn or forget from the different input sources.

Representation of an LSTM cell.

Dont play attention to the blue circles and boxes, as you can see it has a way more complex structure than a normal RNN unit, and we wont go into it in this post.

An LSTM neuron can do this by incorporating a cell state and three different gates: the input gate, the forget gate and the output gate.

In each time step, the cell can decide what to do with the state vector: read from it, write to it, or delete it, thanks to an explicit gating mechanism.

With the input gate, the cell can decide whether to update the cell state or not.

With the forget gate the cell can erase its memory, and with the output gate the cell can decide whether to make the output information available or not.

LSTMs also mitigate the problems of exploding and vanishing gradients, but that is a story for another day.

That’s it!.Now we have a superficial understanding of how these different kind of neural networks work, and we can put it to use to build our first Deep Learning project!ConclusionNeural Networks are awesome.

As we will see in the next post, even a very simple structure with just a few layers can create a very competent Chatbot.

Oh, and by the way, remember this image?Figure with two different images with a short text description made by a neural network.

Well, just to prove how cool Deep Neural Networks are, I have to admit something.

I lied about how the descriptions for the images were produced.

At the beginning of the post I said that these descriptions were made by human annotators, however, the truth is that these short texts describing what can be seen on each image were actually produced by an Artificial Neural Network.

Insane right?If you want to learn how to use Deep Learning to create an awesome chatbot, follow me on Medium, and stay tuned for my next post!Until then, take care, and enjoy AI!Additional Resources:As the explanations for different concepts described in this post has been very superficial, in case there is any of you who wants to go further and continue learning, here are some fantastic additional resources.

How neural networks work end to endYoutube Video series explaining the main concepts about how Neural Networks are trainedDeep Learning & Artificial Neural NetworksOkay, that is all, I hope you liked the post.

Feel Free to connect with me on LinkedIn or follow me on Twitter at @jaimezorno.