Creating a data stream from NIST manufacturing lab data – Part 1

The Industry Experiences team has recently published a solution guide for extracting insights from existing IoT data. The solution consists of the following high-level components.

Ingest data

Hot path processing

Cold path processing

Analytics clients

This is the first of a series of blogs that go through those components in detail. We start with ingesting data into the solution and creating a data stream.

The NIST data

The solution uses the data published by US National Institute of Standards and Technology Smart Manufacturing Systems test bed that exposes the manufacturing lab’s data. We use the data from the lab’s volatile data stream (VDS).

The VDS is implemented using the MTConnect standard. The standard offers a semantic vocabulary for manufacturing equipment. It defines a protocol to communicate with an MTConnect agent and the schema for the data returned.

The agent and its methods

The agent exposes an HTTP endpoint that supports the following API operations:

The solution needs to poll the VDS endpoint. Once it receives the data stream, it can post the data records on the data stream in name value pairs to Azure Event Hubs. Then we can process the stream with a stream processing technology on the other end.

The ingestion component

The ingestion component does the following tasks:

Polls the VDS endpoint.

Saves the raw response as XML to Azure blob storage.

Transforms the hierarchical XML result into a flat data records structure, with name-value pairs and time stamps.

Posts each record to an Azure Event Hub. A different Event Hub is used for each message type.

There are many Azure service options for implementing the ingestion component. All of the following services can poll an HTTP end-point, save the raw data to an interim storage solution and post the transformed data to Azure Event Hubs.

I chose to use Azure Logic Apps, because it is very easy to setup a very simple workflow, make a few HTTP calls against the VDS endpoint, call the necessary REST operations, and continue polling during the scheduled interval. Azure Logic Apps provides more than 200 connectors for performing tasks with data. Since I would like to store the raw response to a data store, the Azure Blob Storage connector is the easiest solution.

Two tasks remain. The component must transform the XML payload to a flat data records structure, and post the results to Azure Event Hubs.

There is a Transform XML connector for Logic Apps. A quick look at this connector reveals the need for developing a map using the Enterprise Integration Pack. Although it is possible to develop a map, that activity may turn out to be a lot of effort when comparing it to doing the same with custom code.

I developed a MTConnect client during one of my fun coding weekends. The component uses Plain Old CLR Object (POCO) classes. The classes were generated using the Visual Studio’s xsd.exe tool. Although the generated code is not capable of de-serializing the polymorphic XML, modifying it to succeed is easy. Simply apply the XmlIncludeAttribute to the polymorphic generated classes.

I decoupled the following two tasks. Polling the VDS endpoint and saving the raw data, and transforming then posting the result to the Event Hub. Such a design moves the more complicated task of parsing and transforming the result, to an easier environment to work in which helps develop, debug, and maintain. The resulting design consists of two microservices. The first is developed as a Logic App, managed by the Azure Logic Apps service. The second custom code is deployed to an Azure Container Instances service. I use a Storage queue for communication between the two microservices.

Implementation of the Logic App

Going through the full implementation of the Logic App portion of the component may take a lot of space and repetition. I will highlight only the parts that need extra attention. The implementation follows the following flow chart. The full source code can be found on GitHub.

Logic Apps can use built-in functions for de-serializing and serializing the JSON and XML contained in the workflow definition language. The raw results are stored as JSON for optimizing the storage requirements. The JSON representation of the same data takes less space. XML tags are more verbose. The result of the current and sample requests is saved into variables using the following formula. If you are using the GUI for editing the workflow, you can use the first line. To edit in code view, make sure the function starts with @ so the interpreter can work:

The key is to de-serialize the XML first, then serialize the result to JSON. This operation creates a JSON string with the attribute names prepended with the “@” character. The activity converts the JSON to a string. Remove the “@” and make a new JSON from the result.

After each probe request is successfully received, the workflow saves the result to a blob on Azure Blob Storage. It then puts a message into Storage Queue with the URL of the saved blob, This notifies the other microservice to transform and post the data records to their subsequent Event Hubs.