NEXT Machine Learning Paradigm: “DYNAMICAL"​ ML

Dynamical ML is machine learning that can adapt to variations over time; it requires “real-time recursive” learning algorithms and time-varying data models such as the ones described in the blog,Generalized Dynamical Machine Learning.

Continuous learning using DYNAMICAL machine learning is ready for implementation today; it adds the following FOUR benefits:

1. Learning in real-time: Just like children mature over time . . .

2. As machines age, ML adapts to NEW normal: Designed for long-term use.

3. Underlying system “states” provide a meta-model: A more stable description; less False Positives.

4. System “states” as “Digital Twin”: Leading to continuous “closed-loop” performance improvement.

In the process of DYNAMICAL machine learning (DML) applied to industrial IoT, the data model and the algorithms used (Generalized Dynamical Machine Learning) naturally generates what is called the “State-space” model of the machine. It may not *look* like the machine but it captures the dynamics in all its detail (there can be challenges in relating “states” to actual machine components though). I am a proponent of using the “State-space representation” that we get for FREE in Dynamical ML as the “digital twin”. This is a topic of current exploration and advancement.

What is DML?

Machine learning can be thought of as “competence without comprehension” (Dennett, 2017). How is this competence gained. Very simple: Machine Learning = REVERSE engineering the “MAP”, given Inputs and Outputs. The following picture shows what a Data Scientist means by “map”.

By “comparing” output predictions or classes and input features, one can “figure out” the relationship or the “map” between the two – this is what a Machine Learning algorithm does. Once this competence is gained, ML can reuse the map when features of the same nature appear later; put the newly arrived feature through the map and get a prediction. How good the predictions will be depend on a variety of things including the nature of the map. If the map is unchanging or “Static”, the prediction will be good. On the other hand, let us say that the map did “move around” (hardly anything in real-life is “static”!); the following picture tries to show the “cycling” of three maps.

What we are seeing is that the map is “dynamical” in the sense that it may be moving from the down-sloping line to the middle to the up-sloping line and slowly doing this dance! In such a case, the predicted output will be totally different from the static case (compare the two right-hand panels).

How realistic is the dynamical case? More often than you think. Current Machine Learning state-of-the-art gets by with the static map . . . and we have done rather well. If you want to take machine learning to the next stage of sophistication so that we can address real-life problems better, DYNAMICAL ML will become necessary. A more complete technical discussion of the differences between Static and Dynamical ML is available in Static & DYNAMICAL Machine Learning – What is the Difference?

As you can see, many key ML applications are dynamical but we get by using our static ML methods. DML bring significant improvement resulting in increased business value.

Let us look at a real-life in-stream processing example conceptually.

Before any in-stream processing is undertaken, there is always a preparatory stage when a ton of problem-specific Big Data will have accumulated in a Data Lake. It is wise to use this data to pre-train any ML solution.

In the in-stream portion, when “supervisor” or “desired” data are available (stock price prediction case; at the end of the day *actual* prices are known), this information can be used to “learn” via Exact Recursive updates (shown by purple arrows). When the supervisor data are not available, we can perform “forecasting” using the last updated “map”.

Here is a brief exposure to the theory and algorithms for DML. A principled yet pragmatic derivation leads to State-space Recurrent Kernel-projection Time-varying Kalman or “RKT-Kalman” method. (I privately refer to RKT-Kalman as “Rocket” Kalman! “RKT” naturally expands to “Rocket” and more importantly, it is a nod to Kalman Filter and its use in rocketry, Apollo program, satellite navigation, GPS and the like). Full derivation and MATLAB code are available in Generalized Dynamical Machine Learning article. Here is the basic structure of Rocket Kalman.

A full description of Rocket Kalman structure is beyond our scope here but the time-varying state-space data model, non-linear projection onto a higher dimension (M > P) and recurrence are easy to notice.

DML as an Improvement to “Train then Test” ML

Once we recognize that no map is ever truly static and that static map is only an approximation of a truly dynamical map, DML can be exploited for traditional ML applications where input-output pairs are first used to reverse engineer the map which is then reused with new and unseen data.

There are a large group of problems where the sequence or the order of inputs in significant. Think of a machine failure prediction ML solution; the inputs will convey information about faults occurring in the machine which ultimately leads to its failure. Here, the arrival sequence of inputs is important since faults will, in all likelihood, increase in amplitude and frequency of occurrence as time passes before ending in a machine failure (abrupt failures are not predictable, however sophisticated our ML is!).

In such cases, during the “training” period, DML will exhibit variations of the map, as expected in any real-life use case; save this “video” of the map. Then when you want to predict failure in a different machine, inputs from this test machine (“Test set”) can be compared to the machine used for training for *temporal location similarity*. Note the point of occurrence of the highest similarity and use the map of the training machine at that instant to make prediction for the test machine. In other words, in the Static ML case, “video” of the map has been collapsed into a “still photo”; in the approach described here using DML, we use the “video frame” that is the *best match*, thus opening up the possibility of significantly better results!

Many advances are possible by building out on the basic Kalman solution; also, a variety of Kernel methods and Bayesian solutions can be substituted for the building blocks of Rocket Kalman! I look forward to the improvements that will be made by the coming generation of Data Scientists who are solidly grounded in Systems Theory . . .