How Smart Are Your Connected Devices? Using Spark and ThingSpan to Provide IIoT Predictive Analytics for Smart Homes.

The Industrial Internet of Things covers a very wide range of devices and systems that interact with one another or dedicated services over the Internet. Although such systems have been deployed by specialist companies, such as building control system suppliers, there has been a recent upsurge in interest in developing unified protocols and standards for IIoT infrastructure. IIoT covers a wide range of disciplines, but they can be grouped as follows:

Infrastructure:

IIoT Cloud Platforms

Network Infrastructure & Sensors

Configuration Management

IIoT Cybersecurity

Techniques:

Big Data Learning

Machine Analytics

Application Sectors:

Manufacturing & Supply Chain

Extraction & Heavy Industry

Utilities and Smart Grid/City/Home

Transportation & Fleet.

The infrastructure and techniques share a lot in common with the consumer/retail IoT domain, so in this first look at applying Spark and ThingSpan in IIoT applications we will look at a simple Smart Home application as the techniques employed are applicable to both domains.

The scenario is a collaborative network of home thermostats that can supply information to a cloud-based analytics and predictive control system. It combines user habits and requirements with weather forecasts in order to feedforward control parameters to the home thermostats, the goal being to minimize energy usage. In a conventional home the occupants will generally adjust the thermostat after they start feeling too hot or cold. At that point it can require a lot of energy to bring the situation under control. The HVAC system has a short time to cool or heat a lot of air that may have taken many hours to reach an uncomfortable temperature. The smart system can slowly supply heating or cooling ahead of the external temperature change.

Step 1 - Loading the Data into ThingSpan

We start by loading the State, City, (weather) Forecast, Home and Device information into ThingSpan. This can be done in batch mode via Spark DataFrames or incrementally via the ThingSpan REST API. The resulting graph of objects and connections is shown below.

Step 2 - Deciding Which Areas Need Attention

We can now use a simple Spark SQL parallel query to identify all of the cities where the weather forecast indicates an imminent or actual large change in temperature.

We find that the forecast for Rio Vista, CA indicates an imminent 10 degree drop in temperature.

Step 3 - Validating Our Premise

The next step is to check the homes in the area to see whether any devices have had their desired temperatures changed in the past half hour. This step isn’t strictly necessary, but it will confirm that things are indeed changing. We find the homes using a simple navigational query from City=”Rio Vista” to associated Home objects.

Sure enough, Home3 in Rio Vista has had its set point increased by 2 degrees recently.

Step 4 - Predicting The Changes Required and Taking Action

So, we can now go ahead and check the other homes in Rio Vista to see if their thermostats need to turn on the heating ahead of the cold front that will arrive shortly. With modern insulation we can apply small amounts of heat over a longer period to save energy yet avoid an uncomfortable drop in temperature that will probably cause the humans to mess with the system, which almost always wastes energy.

We find that Home2 and Home4 need their HVAC activated, with Home2 needing to run the heating for 15 minutes immediately and Home4 needing to run for 20 minutes starting 10 minutes from now. That information can be sent to the thermostat/controller devices so that the homes remain comfortable despite the sudden change in the weather. The applications could also perform long term analysis of weather and response patterns using Spark machine learning (MLlib) to further increase the efficiency of the whole system. Other applications could alert the local power utilities to the anticipated increase in demand, allowing them to provision equipment ahead of it.

Summary

Although we’ve used a simple example it is clear that the combination of Apache Spark data mining and ThingSpan graph analytics is very powerful. The same methodology can be applied in IIoT configuration management and predictive maintenance, IoT cybersecurity, process monitoring and optimization and so on. The main components of such a system are shown below.

Apache Yarn is used to control and monitor workflows, increasing usability and service availability. HDFS increases data availability. ThingSpan excels at navigational and pathfinding queries and its distributed architecture is a natural fit for a Spark environment. It has the performance and scalability to handle high speed parallel data ingest at the same time as complex analytic queries on behalf of multiple clients.