The ventral visual stream, which inspired deep learning algorithms. Optical input from the retina travels through the second cranial nerve (optic nerve) to the lateral geniculate nucleus (LGN), and from there to the first visual area (V1). The ventral visual stream then resolves the raw input from the retina over an increasing degree of complexity: edges and lines as contrast (V1), orientation/frequency/colour (V2), simple geometric shapes (V4) and eventually, to semantic resolution (object understanding) in the posterior (PIT) and anterior (AIT) inferotemporal areas.

When German computational neuroscientist Bernd Fritzke came up with the algorithm that came to be called Growing Neural Gas (GNG) in his seminal 1995 paper, machine learning altogether was a relatively new field, and very strongly inspired by actual neuroscience, which at the time –– largely owing to new methods of neuroimaging, including functional neuroimaging (fMRI), magnetoencephalography (MEG) and diffusion tensor imaging (DTI) –– was experiencing an era of breakthroughs. This inspired computer scientists to create models that were analogous to the way neurons work. For example, deep learning (especially deep convolutional neural networks), might at best be inspired by the way the brain processes information, in particular visual information for its ‘what’ component (called the ventral visual stream), as opposed to its ‘where’ component (catered for by the dorsal visual stream), but drew initial inspiration from those factors. The initial analogy held far enough to create a model with workable mathematical analogies, which could then be expanded upon with things not present in the actual visual system, such as convolutional kernels. Neural Gas similarly draws vague inspiration from a neuronal process called Hebbian learning. This theory, dating back well into the late 1940s, sought to explain the phenomenon of associative learning by the fact that if two neurons sufficiently close to each other tend to fire together, they will eventually be more likely to become synaptically connected than if that were not the case. Of course, this is a thorough simplification (if you’re interested in how much of this is indeed the case, this paper by Rumsey and Abbott in Physiology is a good introduction), but it inspired the idea within artificial neural networks that connections between neurons should be stronger the more often they activate at the same time (‘neurons that fire together wire together’).

The Growing Neural Gas algorithm

A growing neural network ‘learns’ the topology of my face, over only a few hundred steps. The graph, which is quite recognisable, consists of 752 nodes — less than 5% of the original 15,564 blue points making up the image.

Growing Neural Gas is a relatively simple algorithm that, just as the other competitive model we have encountered in the previous chapter, allows for learning and representing topology. It is, thus, a form of a topology-representing network, and as we have seen in Part 1, it is capable of approximating something as complex as the topology of a human face — producing something quite recognisable to a human observer while reducing information significantly. The animation to the left, for instance, demonstrates that something as complex as the human face can be approximated with significant efficiency to create an image that would be unmistakable to an observer.

The general idea of the growing neural gas algorithm: begin by two randomly placed neurons, and add new neurons every once in a while halfway between the worst-performing neuron so far and its worst-performing neighbour. This is iterated until a boundary condition, such as the maximum number of iterations, is reached. Note that the above explanation ignores edge aging and age dropout.

Just as in the previous posts, I’ll leave the rigorous maths for the companion notebook, and stick to explaining the general idea. Just like Self-Organising Feature Maps, GNGs are iterative algorithms. However, unlike SOFMs, they do not require any initial specification of the number of neurons — as the name suggests, GNGs are growing, and new neurons keep getting added as long as the algorithm is running.

Each iteration begins by picking a data point from the training set. Because GNGs generalise very well to an arbitrary number of dimensions, it is common to speak of these in terms of 𝛿-length vectors v.

The neuron nearest to v, called the best-performing unit (BPU) in analogy to SOFMs, is moved closer to v. All neurons directly connected to the BPU are also moved closer to v.

Determine the second-best performing unit (SBPU). If the BPU and SBPU are connected, set the age of this connection to zero. If they are not connected, connect them. Then increment the age of all other edges emanating from the BPU.

If an edge has an age larger than the maximum age Amax, delete the edge. If this results in ‘orphan neurons’ (neurons with no edges connecting them), these are also deleted.

Every λ iterations, the neuron with the largest cumulative error (sum of distance from each data vector v over each iteration) is identified as the worst-performing unit (WPU). Insert a new neuron halfway between the WPU and its worst-performing neighbour and delete the original edge between the WPU and its worst-performing neighbour.

Iterate until some boundary condition, such as maximum number of iterations is reached.

It’s simple to understand how this algorithm works, but it’s worth spending some time on thinking about why it works. As you might have noticed from the examples above, this method creates a partitioning of the space where the data is distributed, and does so by approximating a Delaunay triangulation (indeed, in his original paper, Fritzke referred to the graph generated by a GNG as an ‘induced Delaunay triangulation’). The idea of a growing neural gas algorithm is that unlike a SOFM, which requires some idea of how many neurons are required to represent the data, GNG determines where the model has been performing worst so far, and refines that area. This eventually results in a model that grows not uniformly but rather to expand the size of the graph where it can no longer cover (quantise) the data with the given resolution (number of neurons).

Using GNG to count clusters

In the first introductory Part to competitive neural networks, I have already introduced a use case for GNGs, namely as quick and efficient vector quantisation algorithms that create decent approximations of images. In the following, we’ll be looking at something slightly different, namely counting distinct objects and quantifying their sizes.

Hard and soft exudates on a fundoscopy image from the DIARETDB1 data set (Kauppi et al., 2007).

The DIARETDB1 data set by the research group of Kauppi et al. at Lappeenranta University of Technology contains 89 digital fundoscopy images, that is, images of the fundus of the eye, of five healthy volunteers and 84 people with some degree of diabetic retinopathy. In diabetic retinopathy, a complication of diabetes that affects the small blood vessels of the retina, long term inadequate blood glucose control leads to vascular damage, microaneurysms and exudates, where lipids (causing bright yellow hard exudates) or blood (resulting in pale, diffuse yellow soft exudates) have accumulated on the fundus. In the following, we’ll be using GNG to quantify these abnormalities. The DIARETDB1 data set contains ROI (Region of Interest) masks, but those merely outline areas that show a particular clinical feature. Can we use Growing Neural Gas to count how many clusters of hard exudates are present in the regions of interest? You bet!

Isolating the ROI using consensus masks: the consensus mask of at least two experts’ votes (bottom right) is generated from the original ROI annotations (bottom left). This mask is used to isolate the region of interest from the fundoscopy image (top left), resulting in a masked image (top right).

We begin with some image processing, namely by refining the area of interest. Each image was labelled by four experts, which created a mask. We can threshold the mask so as to require consensus by a given number of experts, a trick widely used in annotated research imagery (scroll to the bottom if you’re unfamiliar with it!). Then, we use the relatively prominent bright yellow colour of hard exudates to convert them to data points the GNG can begin to characterise (for the nitty-gritty, do refer to the companion notebook, where some of the added tricks, including some morphological transforms, are explained).

Next, we exploit the fact that lipid exudates have a very identifiable yellow colour, by thresholding it using OpenCV’s inRange function. At this point, the ROI masking we performed above comes in handy, as the optic disc (the bright yellow circular structure where blood vessels enter and axons of ganglionic neurons leave the retina to join the optic nerve) is, depending on illumination, often of a similar colour. When using inRange, it is common to transform the image to HSV (hue, saturation and value) format, as this allows easier specification of colours in a particular hue range. In HSV, the hue (the ‘colour’) occupies an individual element of the colour vector (usually specified as a degree on a colour circle), so specifying all yellows is as simple as specifying the approximate hue angle of yellow (around 60°) and excluding low-saturation (pale, tending towards white) or low-value (dark, tending towards black) edges. This would be much more complicated with most colours in RGB! Fortunately, OpenCV and its Python bindings have excellent colour space conversion facilities.

Extracting hard lipid exudates from the original ROI using OpenCV’s colour thresholding feature. Only values falling within a particular hue angle range will be retained. After some morphological operations to eliminate noise, we are left with a boolean image that will be converted to a binary grid format for the GNG to be trained on.

Higher terminal neuron count results in a more accurate shape approximation, but also longer run times (left: 200 neurons, right: 1,000 neurons).

When using a GNG to count clusters, a few settings are paramount. In particular, it is worthwhile to begin with a very large number of initial neurons. In the image vector quantisation example mentioned in Part 1, we began with two neurons. The result of this is a graph that for the most time will remain altogether connected (recall that it’s a lot easier for a GNG to generate a new neuron than to drop an existing one). This is a problem, since we’re relying on counting disconnected subgraphs to determine the number of distinct lesions. The easiest solution is to create a large number of initial neurons (several thousand, i.e. 2–3 orders of magnitude more than the expected number of distinct clusters). While prolonging training time, this does result in an increasingly accurate segmentation after a while — and most importantly, unlike segmentations with a lower number of initial neurons, they are much less likely to have connected segments that will require a number of iterations to be separated.

The same settings over a larger amount of iterations result in a better separation of close neurons.

Conclusion

Growing Neural Gas based models aren’t merely great at vector quantisation and creating a viable, intelligible representation of data from a tiny fraction of all data points. They can also find coherent topologies where the number of such topologies is unknown. Given time to converge, these models can not merely identify the number of distinct topologies but also quantify their relative and absolute area.

Competitive neural networks remain a largely underutilised technique to this day, but recent efficient implementations, such as Tensorflow SOFM layers and Tensorflow-based GNG implementations, may hopefully contribute to a resurgence of competitive neural networks, by themselves or as part of larger neural network graphs.

A footnote: mask thresholding and expert consensus

Where experts annotate imagery, it is common for annotation shapes to have the opacity 1/n, where n is the number of annotating experts. So four experts would each annotate with a 25% intensity signal. Conveniently, this means we can use thresholding on such an image to determine consensus levels. Typically, these images are grayscale images. Therefore, say, for four annotators, any pixel with an intensity above 50% must have the ‘vote’ of at least two annotators. Thus, a thresholding that omits pixels with less than 50% intensity will create a mask that requires the consensus of at least half of the annotators. Widely used in annotating imagery in scientific research with Regions of Interest (ROIs), this allows us to restrict our consideration to plausible sites and ignore noise from other structures.