Artificial Neural Networks

An Artificial Neural Network (henceforth ANN) is a form of computing system that vaguely resembles the biological nervous system. It is composed of very many neurons that are centres of computation and learn by a sort of hit and trial method over the course of many epochs. One can correctly say that an ANN and a perceptron function in an identical fashion. Much rather, the ANN is actually derived from the perceptron to render a more accurate output.

Perceptron versus Artificial Neural Networks

An Artificial Neural Networks is nothing but a multi-layered perceptron structure. Unlike a perceptron, an Artificial Neural Networks most certainly has more than one neuron in the hidden (computation) layer and usually has more than one hidden layers.

From the figure, one can infer that a perceptron is analogous to the functioning of a single neuron in taking in the input and formulating the output. However, when a multitude of such neurons work together to calculate the output whilst transferring the information from layer to layer, the system becomes more functioning and the output is more accurate. This is because of the slightly more sophisticated structure of the neural networks as compared to the perceptron.

Working

Similar to the perceptron, the ANNs take in the values from the input layer and prior to entering the hidden layers, the inputs are weight adjusted i.e. each synapse is assigned a random weight, each input is multiplied by the corresponding weight and all these products are added. After entering the hidden layers, this sum of products is put into an activation function (which could be the tanh or the sigmoid functions for more sophisticated models and simply the Heaviside unit step function for more basic models). This result is tallied with the ground truths and the error generated is minimised through back-propagation.

Stochastic Gradient Descent

For minimising the error or the cost function, we back-propagate the error as followed:

Start at a random point on the cost function (which is a function of the predicted output)

Find the derivative of the cost function

Repeat step 4 until the derivative is equal to 0

If the derivative is less than 0, move to the right i.e. the predicted output should be greater than current output, else if the derivative is greater than 0, move to the left i.e. the predicted output should be less than current output.

To prevent ending up at a local minimum of the cost function, one should ideally carry out the back-propagation after every row of the dataset. This process is called the Stochastic Gradient Descent, where stochastic refers to randomly determined processes, alluding to the randomisation of weights and gradient descent essentially means that we are descending through the gradient (slope, derivative) until we reach zero.

Training the ANN

Training of the ANN can be compiled into the following steps:

Initialise random weights to values close to 0 (but not 0).

Input first row of dataset into the input layer, with each variable assigned to each node of the layer.

Forward Propagation: Information flows from left to right. In the neurons, input values are multiplied with their corresponding weights and the products are added, thereafter the activation function acts on this sum of products and renders the output.

Compare the rendered output to the actual result and measure the generated error.

Back-Propagation: Information flows from right to left. Weights are updated by how much they are responsible for the error and the learning rate decides by how much we update the weights.

Repeat steps 1 through 5 and update the weights after each observation (Reinforcement Learning) or update the weights after a batch of observations (Batch Learning).

When the whole training set is passed through the ANN, it is called an epoch. Redo more epochs.

Implementations

This is a simple classifier model based on artificial neural networks.