"If we are to learn anything about the ultimate nature of species we
must reduce the problem to the simplest terms and study a few easily
recognized, well differentiated species"
– Edgar Anderson, Annals of the Missouri Botanical Garden, 1928

Recently I've added a Python API to a C library for deep learning known
as libdeep. The libdeep project was started in 2013 and is one of my
more popular open source projects on Github. The core of it is based
upon back-propagation code which I originally wrote in the early 1990s
when computers were a lot less powerful than they are now, and since
that time the fortunes of neural networks within the popular science
imagination have waxed and waned.

Neural computing is about as old as digital computing. In 1943 Warren
McCulloch and Walter Pitts proposed a neuron-like computing element.
Alan Turing picked up on that and in the post-war period he writes to
the cybernetics pioneer Ross Ashby suggesting that Ashby use the new
Pilot ACE machine to simulate a neural network. By the late 1950s
perceptrons are invented and then after some argumentation their design
was refined with the addition of a sigmoidal activation function. About
a decade ago better ways of training neural nets with many layers were
developed by Geoffrey Hinton, and that's the so-called "deep learning"
that we know of today.

It should be said that most neural network systems running on computers
bear little or no resemblance to biological brains. Biological neurons
do a lot more stuff besides adjusting synaptic strengths. Classical
neural nets could be characterised as a variety of (more or less
practical) machine learning system inspired by what neurons were
believed to do by the standards of neurophysiological understanding in
the mid 20th century and are situated within the now largely discredited
psychological paradigm of behaviourism.

My own deep learning implementation isn't the same as Hinton's. I'm not
using Restricted Boltzmann Machines, although I am using dropouts and
unsupervised feature learning with autocoders. Adding a Python API makes
the system available to a larger number of software developers and might
also include educational uses.

So without further ado let's start with the simplest possible example:
the XOR problem. In the 1970s XOR was a problem, because perceptrons
couldn't learn that logic function and so the criticism was leveled that
this was a strong indication that neural nets were not going to be able
to rival more traditional symbolic logic. First we can create a data set
in the form of a CSV file from the XOR truth table.

0,0,0
0,1,1
1,0,1
1,1,0
0,0,0
0,1,1
1,0,1
1,1,0
0,0,0
0,1,1
1,0,1

There's repetition here, but that's only to give the system chance to
randomly pick 80% of the data for training and 20% for testing. Then we
can use the following Python program:

This loads the data and creates a neural net with 4 hidden units and a
single hidden layer. If you run that it will save an image called
training.png, which shows the training error over time and at the end it
saves the trained neural net to a file called xor.nn. We can then test
that the XOR logic function was successfully learned with the following
program:

And if all is well then on running this program you should see the XOR
truth table. Of course, XOR is not the most exciting thing ever, and the
above is only intended to illustrate how you can implement deep learning
in a small enough number of lines of Python code that you could probably
use it for teaching purposes. You can mess around with the learning rate
and other parameters to see how that alters the training graph or the
test performance.

Now let's tackle a more interesting problem. In 1935 the botanist Edgar
Anderson was researching the origin of species. Contrary to the notion
of branching under strict descendancy in reproductive isolation perhaps
hybridization within suitably conducive habitats could itself give rise
to new species with overlapping phenotypic characteristics. He measured
the physical dimensions of Iris plants growing together in the same
colony in the Mississippi delta region, which belonged to three species:
Iris setosa, Iris versicolour and Iris virginica. The resulting data set
was later published by the statistician Ronald Fisher and has become a
well known machine learning classification problem. It can be found
here:
http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data

To use this as a libdeep data source we'll first need to tidy it up.
libdeep currently doesn't understand text, only numbers, so we can
replace the species names with class 0, class 1 and class 2.

This creates a neural net with 16 hidden units and three hidden layers
to classify three types of things. Because this is a harder problem it
takes longer to train than the XOR example. When the training is done
another program can be run to test that the learning worked.

The measurements supplied to the test function are different from any of
the samples in the data set, so this determines whether the system is
able to generalise rather than just learning specific training values.
If the training went well then we'll see that the expected and returned
species classifications are the same. Rather than being a "toy" problem
this is closer to something which could be a practical application on a
mobile device used by an ecologist conducting a field survey.

So this was an introduction to the new libdeep Python API. We've only
dealt with classification problems on fully labeled data sets, and
there's more work to be done to support time series and recurrent networks.