Computation steps? You mean code, right?

Well yes, we could to that. We’d have to define all the steps explicitly and run them sequentially on our data. This is called “imperative programming” and it’s how Fortran, Pascal, C, C++ and so on work. Nothing wrong with that.

However, neural networks are intrinsically parallel beasts: inside a given layer, all outputs can be computed simultaneously. Independent layers could also run in parallel. So, in order to get good performance, we’d have to implement parallel processing ourselves using multithreading or something similar. We know how that usually works out. And even if we got the code right, how reusable would it be if data size or network layout kept changing?

Fortunately, there is an alternative.

Dataflow programming

“Dataflow programming” is a flexible way of defining parallel computation, where data flows through a graph. The graph defines the order of operations, i.e. whether they need to be run sequentially or whether they may be run in parallel. Each operation is a black box: we only define its input and output, without specifying its actual behaviour.

This might sound like Computer Science mumbo jumbo, but this model is exactly what we need to define neural networks : let input data flow through an ordered sequence of operations called “layers”, with each layer running many instructions in parallel.

Enough talk. Let’s look at an example. This is how we would define E as (A*B) + (C*D).

E = (A*B) + (C*D)

What A,B,C and D are is irrelevant at this point. They are symbols.

No matter what the inputs are (integers, vectors, matrices, etc.), this graph tells us how to compute the output value — provided that operations “+” and “*” are defined.

This graph also tells us that (A*B) and (C*D) can be computed in parallel.

Of course, MXNet will use this information for optimisation purposes.

The Symbol API

So now we know why these things are called symbols (not a minor victory!). Let’s see if we can code the example above.

Now, it’s time to let our input data flow through the graph in order to get a result: the forward() function will get things going. It returns an array of NDArrays, because a graph could have multiple outputs. Here, we have a single output, holding the value ‘14’ — which is reassuringly equal to (1*2)+(3*4).