The "Second Digitalisation"

Short paper based on discussions and presentations regarding

the importance of semantic modelling in process industries.

M. J. Neuer

Introduction

In recent years, the term digitalisation is used very frequently. Technically it refers to the conversion of analogue, in other words continuous, signals to discrete sample points. The latter can
then be interpreted by computers. Let us therefore even put the term digitalisation in more simple words, defining digitalization as a mean to provide computers with information. It is a weak
definition, but for the coming considerations this is quite convenient.

Sampling of signals as an example of the "First Digitalisation"

An obvious first example for data are signals. All various types of signals have been digitalised and this affected our daily life: our music went from vinyl records to tapes, both analogue media
and lastly over compact discs to MP3 files, both representing the digital media. The same happened to video information, too, but later due to the bigger data volume involved with motion
pictures. Let us call this type of conversion “first digitalisation”, termed in the most physical sense as sampling data points.

First digitalisation then applies to many information in our daily life, but essentially for industrial data. With the inflation of the world-wide-web, social media arised, spawning data volumes
of extreme sizes. In the wake of these technologies it became obvious that we needed concepts to analyse and understand relations hidden in these data. The term Big Data [1]was coined,
defined in terms of the big ‘v’s denoting data that is big in volume, exchanged with high velocity, exhibits a certain amount of veracity and contains a reasonable value to be extracted. Methods
to handle such data were also rapidly developed [2],[3], finally leading to ways to analyse them [4]and getting impact and value [5].

Compared to social media data e.g. originating from twitter, common industrial data is indeed much smaller. Even when pictures, video and audio information is included, industrial data streams
tend to be smaller than social media traffic. Moreover, many industrial data are simply unused. Recorded for “in-case-of” scenarios, many data were not systematically parsed and just stored away.
Upon realizing this drawback, activities were initiated to push the topic of exploiting data further into focus. Therefore, by today, several research projects are dedicated to the evaluation of
large data streams coming from process chains. We may conclude that in a certain sense, the first digitalisation was successful, because data science as field is prospering and all industrial
sectors have realized the impact potential.

Context

Let us now come back on our definition of digitalization. Implicitly, many people imagine data as tables with rows and columns. Sometimes, one may think about vectors or matrices. Now,
please keep exactly this idea in mind and consider some vector of numbers, for instance a series of velocities that you reached with your car. Once these velocities are recorded, they are nothing
else than numbers in a table. You can plot them in a diagram, look at them or use them for any kind of posterior analysis. For your computer, they are still just numbers. Numbers without any
context. The computer does not know that they are velocities. This knowledge is added by you when you analysed the data, because you knew these values are recorded velocities.

We proceed by asking the question, can we give such a context information to the computer? And the answer is yes. First we define abstractly the term “car”. Next, we associate so-called state
variables with the “car”, which are “position” and “velocity”, whereas “velocity” can be calculated by differentiating “position” with respect to “time”. Imagine we have these kinds of objects
available, we could now tell the computer that there are even multiple cars. Each car being the instance of the object “car” we just defined in an abstract way. The computer does now know, that
each of the cars must have an “velocity” associated with it. We can even go one step further and define the object “street”. “Streets” may have several “cars” on them. “Cars” have “engines” that
influence the state variables, thus without an “engine” the “car” has the “velocity” zero. You can see, that with this way, we can supply context to the computer and the computer can start to
deduce and perform inference.

Semantic modelling as "Second Digitalisation"

Of course, what we described here is semantic modelling. It is a second digitalisation which introduces a new type of information to the computer, information that does not originate from
sampling signals, that does not come from sensors or any other automatic acquisition. Semantic information must be modelled and it is essentially different than just sampling data. Once the
computer has a domain model of a “street” and the “car”, it can start to work with these terms. Especially in industrial data applications, semantic modelling and contextual sampling are
currently still underrepresented.