Building AI: Compiling the neural network

Editor’s note: This is a series of blog posts on the topic of “Demystifying the creation of intelligent machines: How does one create AI?” You are now reading part 4. For the list of all, see here: 1, 2, 3, 4, 5, 6, 7.

As I discussed in my last posts, I have been working with colleagues at DXC to build an artificially intelligent fan, one that can monitor its operations, report issues and even, sometimes, fix problems itself.

In previous posts, I explained how the process of creating AI requires knowledge about various machine learning methods (including three fundamental theorems that guided our work), a deep understanding of how they work and what their advantages and disadvantages are. From this theoretical base, we decided to use a combination of pre-processing (data wrangling; feature extraction) and an autoassociative neural network to build our fan. Here’s how it all came together:

To get AI right, you need to get the neural network right. And improving the neural network to make it a better model of a given data set is done by changing its architecture.

We can change the number of layers, sizes of layers, connectivity between layers (e.g., fully connected vs. sparsely connected) and so on. Also, we can add some knowledge from visual cortex anatomy to that architecture, for example, convolution layers for pattern recognition. This way, human understanding helps direct the inductive biases of the network.

We can also change the mathematics of the net. For example, replacing the traditional sigmoid transfer function with other functions (e.g., ReLU) can in many cases improve performance; by changing some properties of equations, a net can become a better model of data.

Still, the fact is that the real world is much richer than the variety of things we can change in a neural network. Therefore, one more strategy often needs to be used: adjust the data to fit the model. If we transform the data, we can make it better fit the network.

Thus, a poor model of original data may become a good model of transformed data.

In the case of raw data coming from a sensor, the direct feed will often result in poor performance. The input values represent degrees of acceleration at subsequent points in time.

The difficulty for a neural network is that any of the values can be fed into any neuron in the input layer. This means that a neural network, in its early processing stages, treats all inputs equally. It then must extract in later layers unique features that are treated differently. In other words, a deep neural network architecture is required. The early layers should provide the network with an inductive bias appropriate for this type of data.

Here, we used another, more elegant solution.

The equivalent to multiple network layers can be achieved by knowing the mathematics of input data. In this case, we happen to have access to mathematical tools well suited for the data.

As the fan rotates mostly with constant speed, the signals can be described accurately with sinusoidal functions. This in turn means we should be able to use Fourier transform to effectively prepare the data for neural network learning.

Fourier transform is a model that has induction bias toward the sinusoidal. And since sinusoidal is a good model of our data, according to the no-free-lunch theorem, we should achieve effective learning with Fourier.

We decided to feed into the autoassociative neural network the values obtained from power spectrum. The network had 512 input units and the same number of output units. The network was trained to reproduce the input values as closely as possible in its output layer.

For the autoassociative network to work well, it was necessary to conduct two analyses:

We first assessed the appropriate number of units in the hidden layer that forms the bottleneck, i.e., the one with fewest units. As mentioned, this number should be neither too large nor too small.

The other analysis determined how much discrepancy between the values in input vs. output layers was considered an anomaly. Surely, a small discrepancy will always be there. But how much of this reconstruction error is enough to claim that we have detected something wrong with the fan?

The number of hidden units in the bottleneck depends on the dimensionality of the data. If the data are highly dimensional, a larger number of units will be needed, and if the data are low dimensional, an effective compression of data can be achieved with just a few units.

We assessed the dimensionality of the data using principal component analysis, counting all components with Eigen-value larger than 1. We presumed that the number of units should approximately correspond to the number of dimensions obtained through principal components. This led us to use 4 units in our bottleneck hidden layer.

We then trained the network by standard methods, and then computed the distribution of the reconstruction errors across test data. In addition, we collected data with anomalies (these will be discussed in more detail later), and plot one against the other, expecting not to find any overlaps between the two distributions. This is exactly what we found. The distribution of errors for normal operations was completely disjoined from the distribution of errors during anomalous operations.

We even had a large range for choosing different values for the threshold as a way to adjust the sensitivity of our anomaly detector. After some fiddling and trial and error, we settled for a threshold at about 2 standard deviation from the mean (referring to data acquired during normal operations).

And that was it.

This was sufficient to create a reliable detector of anomalies. In next posts, I will describe how we classified those errors.

To conclude, the first component of our AI was a neural network for detecting anomalies. This network needed to detect problems that were not known during the design. We accomplished this with a combination of an autoassociative network and data preparation.

The solution worked well and is relatively simple. Also, once trained, the solution requires little processing power, being easily implemented on a Raspberry Pi.

One reason for our success was that we used data science theory and analyses. That way we understood both the data and their relationship to the used machine learning tools. Some fundamental theorems guided us in our choices. The resulting solution is elegant and reliable.

Trackbacks

[…] Editor’s note: This is a series of blog posts on the topic of “Demystifying the creation of intelligent machines: How does one create AI?” You are now reading part 5. Previous posts: 1, 2, 3, 4 […]

ABOUT THE AUTHOR

Danko Nikolic is a data scientist at DXC. Before joining DXC, he worked as a scientist and entrepreneur. For many years he led a lab for brain and mind research at the Max Planck Institute, investigating how the brain works, inventing new statistical methods and pondering how to devise a better form of artificial intelligence (AI). He inherited his family’s business genes and has started several companies and been involved in multiple startups, focused on topics ranging from civil engineering and IT to psychology and AI. He has degrees in civil engineering and psychology, a PhD in cognitive psychology and is an honorary professor of psychology at the University of Zagreb.