As a reminder, our task is to detect anomalies in vibration (accelerometer) sensor data in a bearing as shown in _Ref 495401409.

Figure 1. Accelerometer sensor on a bearing records vibrations on each of the three geometrical axes x, y, and z

When talking about deep learning a lot of people often talk about libraries such as TensorFlow and PyTorch. Those are great tools, but in my opinion provide relatively low-level support, meaning you have to think a lot about linear algebra and the shapes of matrices. Keras, on the other hand, is a high-level abstraction layer on top of popular deep learning frameworks such as TensorFlow and Microsoft Cognitive Toolkit—previously known as CNTK; Keras not only uses those frameworks as execution engines to do the math, but it is also can export the deep learning models so that other frameworks can pick them up. And because we’ve introduced Deeplearning4j and SystemML already in this series, I’m happy to tell you that both frameworks can read and execute on Keras models. Isn’t that great? You can do the fast prototyping in Keras and then scale out on Apache Spark using Deeplearning4j or SystemML as an execution framework for your Keras models. Finally, for completeness, there exists frameworks like TensorFrames and TensorSpark to directly bring TensorFlow to Apache Spark, but this is beyond this article.

What you’ll need to build your app

An IBM Watson Studio account. You can use the IBM ID you’ve created while registering for the IBM Cloud Account.
To get started using IBM Watson Studio, including signing up for an account, watch the videos in this video collection.

Setting up your development environment

Before we talk about the deep learning use case, spend some time setting up your development environment. We use a Jupyter Notebook running inside Watson Studio.

We’ll start with a little example of two “pickled” data sets (the pickle interface is the serialization and deserialization framework in Python) – one containing healthy data and one containing broken data to develop the neural network.

Afterward we’ll turn this notebook into a real-time anomaly detector accessing data directly from an MQTT topic by using the IBM Watson IoT Platform.

Import a Jupyter notebook that contains all the necessary code into Watson Studio.
a. In the top right, click Create new > Notebook. Note: If you have an existing account, from the upper left click IBM Watson Studio to open the dashboard, and then click Get started. From there you can click New notebook.
b. Click From URL, in the Name field add a name, and in the Notebook URL field, paste the following URL:

Google released TensorFlow under the Apache 2.0 open source license in 2015.

TensorFlow has two components: an engine executing linear algebra operations on a computation graph and some sort of interface to define and execute the graph. The engine in TensorFlow is written in C++, in contrast to SystemML where the engine is written in JVM languages. Although different language bindings for TensorFlow exist, the most prominent one is Python.

The engine of TensorFlow can run on CPUs, GPUs, TPUs, and for mobile and embedded devices there is TensorFlow Lite. There are three components of the engine:

A client component that creates and takes your computational execution graph and submits it to the master component.

A master component that parses the computational execution graph and distributes it to worker components.

Worker components that execute parts of the computational execution graph.

Worker components can make use of specialized hardware like GPUs and TPUs. Distributed TensorFlow can run on multiple machines, but this is not covered in this article because we can use Deeplearning4j and Apache SystemML for distributed processing on Apache Spark without the need to install distributed TensorFlow.

In other words, TensorFlow is nothing more than a domain specific language (DSL) expressed in Python to define a computational execution graph for linear algebra operations and a corresponding (parallel) execution engine for running an optimized version of it.

What is Keras?

You might have noticed that TensorFlow can get to low-level linear algebra sometimes, and the code doesn’t even look nice from a mathematician’s perspective. Therefore, I suggest using Keras wherever possible. This blog post titled Keras as a simplified interface to TensorFlow: tutorial is a nice introduction to Keras. I will explain Keras based on this blog post during my walk-through of the code in this tutorial.

Create a Keras neural network for anomaly detection

We need to build something useful in Keras using TensorFlow on Watson Studio with a generated data set. (Remember, we used a Lorenz Attractor model to get simulated real-time vibration sensor data in a bearing. We need to get that data to the IBM Cloud platform. See the tutorial on how to generate data for anomaly detection.) When we set up our development environment we imported the WatsonIoTPlatformKerasTFLSTM Notebook and we’ll look at it now. Note: If you closed Watson Studio, open the WatsonIoTPlatformKerasTFLSTM notebook you previously imported.

Next, run the cell look at the same chart after we’ve switched the test data generator to a broken state. The obvious result is that we see much more energy in the system. The peaks are exceeding 200 in contrast to the healthy state which never went over 50. Also, in my opinion, the frequency content of the second signal is higher.

Note: We are plotting the imaginary part of the red dimension to see three lines because two dimensions on this dataset are completely overlapping in frequency and the real part is zero. Remember, the way FFT (fast Fournier transform) works is retuning the sine components in the real domain and the cosine components in the imaginary domain. Just a hack mathematicians use to return a tuple of vectors.

Let’s contrast this healthy data with the broken signal. As expected, there are a lot more frequencies present in the broken signal.

Create unsupervised machine learning

We now have enough evidence to construct an anomaly detector based on supervised machine learning (with a state-of-the-art model like a gradient boosted tree). But we want unsupervised machine learning because we have no idea which parts of the signal are normal and which are not.

A simple approach to unsupervised machine learning is to feed those 3,000 frequency bands into an ordinary feed-forward neural network. Remember, DFT (discrete Fourier transform) returns as many frequency bands as we have samples in the signal, and because we are sampling with 100 Hz for 30 seconds from the physical model this is also the number of frequency bands.

With this approach we have transformed our three-dimensional input data (the three accelerometer axes we are measuring) into a 9,000 dimensional data set (the 3,000 frequency bands per accelerometer axis). This is our new 9,000 dimensional input feature space. We can use the 9,000 dimensional input space to train a feed-forward neural network. Our hidden layer in the feed-forward neural network has only 100 neurons (instead of the 9,000 we have in the input and output layer). This is called a bottleneck and turns our neural network into an autoencoder.

We train the neural network by assigning the inputs on the input and output layers. The neural network will learn to reconstruct the input on the output. But the neural network has to learn the reconstruction going through the 100 neuron hidden-layer bottleneck. This way we prevent the neural network from learning about any noise or irrelevant data. We will skip this step here because the performance of such an anomaly detector is usually quite low. You can see an implementation of this in my last article on Apache SystemML where I’ve implemented a feed-forward neural network auto-encoder.

When working with neural networks it is always good to scale data to a range between zero and one. Because we are planning to turn this notebook into a real-time anomaly detector for IoT sensor date we are defining a function instead of transforming the data so that we can make use of the transformer at later stage.

Improve anomaly detection by adding LSTM layers

We can outperform state-of-the-art time series anomaly detection algorithms and feed-forward neural networks by using long-short term memory (LSTM) networks.

Based on recent research (the 2012 Stanford publication titled Deep Learning for Time Series Modeling by Enzo Busseti, Ian Osband, and Scott Wong), we will skip experimenting with deep feed-forward neural networks and directly jump to experimenting with a deep, recurrent neural network because it uses LSTM layers. Using LSTM layers is a way to introduce memory to neural networks that makes them ideal for analyzing time-series and sequence data.

To get started let’s reshape our data a bit because LSTMs want their input to contain windows of times.

This way instead of 3000 samples per dimension (per vibration axis) we have 300 batches of length 10. What we want to do is given the last 10 time steps of the signal predict the future 10.

Let’s walk through that code a bit. The first thing we need to do in Keras is create a little callback function which informs us about the loss during training. The loss is basically a measure how well the neural network fits to the data. The lower the better (unless we are not overfitting).

Let’s walk through this code. First, we create an instance of a Sequential model. This allows us to add layers to the model as we go.

model = Sequential()

Then, we add an LSTM layer as first layer with 50 internal neurons, input shape of 10 by three, and we want the layer to return a sequence of the predicted future time steps.

model.add(LSTM(50,input_shape=(timesteps,dim),return_sequences=True))

We do this eleven times, making it an eleven layers deep LSTM neural network. Why did I choose eleven? This is part of the black magic. The more layers you add the more accurate the predictions are (up to a certain point), but the more compute power is necessary. In most cases, tuning the neural network topology and (hyper)-parameters is considered “black magic” or “trial-and-error.”

To get back to normal, we finalize with a normal, fully connected feed-forward layer to bring down the dimensions to three again.

loss=mae, which means that the training error during training and validation is measured using the “mean absolute error” measure.

adam``,which is a gradient decent parameter updater.

model.compile(loss='mae', optimizer='adam')

For convenience we create two functions for training and scoring. (Note: We provide the same data as input and output, because we are creating an auto-encoder.)

model.fit(data, data, epochs=50, batch_size=72,…

Let’s do the test and train the neural network twenty times with healthy data and one time with broken data. Remember that your callback function handleLoss adds the trajectory of losses (time-series) to an array which we can plot. Therefore, we should see a spike whenever the healthy data pre-trained neural network sees broken data. Not because it knows what healthy or broken is, but because it tells us that it is unlikely that this time-series pattern the neural network currently sees has been seen before.

And here it is. On time step 5000 we can clearly see that something is going on. Note that the initial loss was higher, but this is because initially the weights of the neural network have been initialized randomly, therefore leading to bad predictions.

8

Analyze the data in real-time with the IBM Watson IoT Platform using MQTT

The last step is to hook this anomaly detector up to the IBM Watson IoT Platform using MQTT to analyze data in real-time. To hook-up our neural network to the platform is straightforward. The values you need are highlighted in _Ref 496518864: org, apiKey, and apiToken. These credentials were generated when you created an IBM Cloud app using the Internet of Things Platform Starter.

Perform streaming analysis by creating a count-based tumbling window

We need to create a window for our data (actually we create a Python queue).

q = Queue(7000)

We are working on count-based tumbling windows of size 3000. We can store more than two such windows in the queue until we prevent the Watson IoT Platform’s MQTT message broker to accept new messages. Nevertheless, we need to make sure we are running fast enough to cope with the data arrival rate, otherwise the system will eventually trash. We subscribe to data coming fromthe MQTT message broker and define a call-back function which puts the data into our queue.

Because the actual training loss is sent back to the IBM Watson IoT Platform in real-time we can create a little dashboard to visualize the vibration sensor data together with the anomaly score in real-time (creation of this dashboard is not part of this article).

Conclusion

This completes our third deep-learning tutorial for IoT time-series data and concludes the series. We’ve learned how TensorFlow accelerates linear algebra operations by optimizing executions and how Keras provides an accessible framework on top of TensorFlow.

Finally, we’ve shown that even an LSTM network can outperform state-of-the-art anomaly detection algorithms on time-series sensor data – or any type of sequence data in general.

Acknowledgements

I’m deeply thankful to Michelle Corbin and Gina Caldanaro – two fantastic editors – for working with me on this series.