Table of Contents

Speed Test for Neural Net Implementation

If you want to implement a neural net you are confrontrated with the question which design to use for such an implementation.

Since connectionist systems as neural nets often consist out of hundreds, thousands, or even millions of simple computing elements (neurons) the question arises in the context of a code implementation whether to model each neuron by a separate class.

At the first glance this seems to be the method of choice since we could use two classes:

a neural net class A which holds all the references to the neurons

a neuron class B which encapsulates all the neuron stuff (activation value, output value, links to other neurons, etc.)

But this could be computationally expensive, because for the neural net computation we need to access all the single neuron classes B for updating their activations, recompute an output, etc.

The alternative is to use just one class (class C) that holds the activation and output values of all the neurons in arrays. While this is faster - see the result of the following very simple speed test - it carries the danger of leading to a much more cluttered implementation.

Speed Test Results

A = neural net class

B = neuron class

C = neural net class

D = neuron (state values)

SpeedTest: is it really a good idea to spend a separate object for each neuron?
We will simulate N=100000 neurons (smaller classes)
A+B
Time needed for GenerateSomeBs(): 165 ms
Time needed for SetValueOfAllBs(): 4 ms
Time needed for GetValueOfAllBs(): 4 ms
C with internal D values
Time needed for GenerateValuePlaceholdersForDs(): 1 ms
Time needed for SetValueOfAllDs(): 1 ms
Time needed for GetValueOfAllDs(): 0 ms

SpeedTest: is it really a good idea to spend a separate object for each neuron?
We will simulate N=1000000 neurons (smaller classes)
A+B
Time needed for GenerateSomeBs(): 1670 ms
Time needed for SetValueOfAllBs(): 42 ms
Time needed for GetValueOfAllBs(): 38 ms
C with internal D values
Time needed for GenerateValuePlaceholdersForDs(): 3 ms
Time needed for SetValueOfAllDs(): 4 ms
Time needed for GetValueOfAllDs(): 5 ms

Conclusion

When using 1.000.000 neurons we need 38 ms to access each of the B class (neuron) values once. But if each of these neurons has 100 connections to other neurons, we need to access the output values 1.000.000 * 100 times to compute the inputs for all neurons (weighted sum of inputs), leading to 38 ms * 100 = 3.8s compared to 5 ms * 100 = 0.5s.

So if time is critical (which is mostly the case…), try to avoid modeling each neuron by a separate class.