C# Backpropagation Tutorial (XOR)

I’ve been trying for some time to learn and actually understand how Backpropagation (aka backward propagation of errors) works and how it trains the neural networks. Since I encountered many problems while creating the program, I decided to write this tutorial and also add a completely functional code that is able to learn the XOR gate.

Since it’s a lot to explain, I will try to stay on subject and talk only about the backpropagation algorithm.

1. What is Backpropagation?

Backpropagation is a supervised-learning method used to train neural networks by adjusting the weights and the biases of each neuron.

Important: do NOT train for only one example, until the error gets minimal then move to the next example - you have to take each example once, then start again from the beginning.

Steps:

forward propagation - calculates the output of the neural network

back propagation - adjusts the weights and the biases according to the global error

In this tutorial I’ll use a 2-2-1 neural network (2 input neurons, 2 hidden and 1 output). Keep an eye on this picture, it might be easier to understand.

2. How it works?

initialize all weights and biases with random values between 0 and 1

calculate the output of the network

calculate the global error

adjust the weights of the output neuron using the global error

calculate the hidden neurons’ errors (split the global error)

adjust the hidden neurons’ weights using their errors

go to step 2) and repeat this until the error gets minimal

3. Some math…

As any neural network requires an activation function, we’ll use sigmoid activation. The main idea is to adjust that function so it will produce the correct output (and the minimum error). This is done by modifying the weights and the biases.

Its graph looks like this (note that the output values range from 0 to 1)

Sigmoid formulas that we’ll use (where f(x) is our sigmoid function)

1) Basic sigmoid function:

2) Sigmoid Derivative (its value is used to adjust the weights):

Backpropagation always aims to reduce the error of each output. The algorithm knows what output is correct when the error is getting under a threshold.

For a better understanding of this, take a look at the graph below which shows the error, based on the output:

5. The code

The best part and also the easiest. There are many things backpropagation can do but as an example we can make it learn the XOR gate…since it’s so special.
I used 2 classes just to make everything more “visible” and OOP-ish.