Deep Learning for Internet of Things Using H2O

H2O is feature-rich open source machine learning platform known for its R and Spark integration and it’s ease of use. This is an overview of using H2O deep learning for data science with the Internet of Things.

H2O is an Open Source machine learning platform for smarter applications. At the Data Science for IoT course, we have been following H2O for features such as Open Source, R integration, Spark integration, Deep Learning and it’s ease of use.

This blog is authored by Sibanjan Das and Ajit Jaokar as part of our work at the Data Science for IoT course exploring H2O Deep Learning for Internet of Things.

Analytics are at the core of IoT. To implement IoT analytics, we need to apply Machine Learning Algorithms to IoT datasets. However, the methodology for machine learning implementations is different from traditional techniques. Several factors and constraints apply when we consider Data Science for IoT implementations. These include:

Data capture frequency: Data is not produced at same frequency in most of the devices and information systems.

Variety of data sources: To produce right analytics, data from different sources which includes historical and real time systems need to be brought together.

Many features of H2O provide the Smart capabilities for IoT Analytics – for instance In-Memory processing, the Deep learning Package and the Spark streaming package (Sparking Water). The Deep learning package from h20 features automatic data handling techniques to standardize data, handle missing values and categorical data conversions. It also includes automatic performance tuning, load balancing and it takes out the overhead of complicated configuration files. Sparkling water for Spark enables easier ingestion of data from real time sources like Apache Spark to deep learning algorithms. Also, it's in memory compression techniques it can handle huge amount of data even in a small cluster. This makes it a suitable choice for architecting analytics for Internet of Things.

First, we look into the deep learning capabilities of H2O. This will provide a backdrop for the use of H2O for building IOT capabilities.

Deep learning algorithms play an important role in IOT analytics. Data from machines is sparse and/or has a temporal element in it. Even when we trust data from a specific device, devices may behave differently at different conditions. Hence, capturing all scenarios for data preprocessing/training stage of an algorithm is difficult. Monitoring sensor data continuously is also cumbersome and expensive. Deep learning algorithms can help to mitigate these risks. Deep Learning algorithms learn on their own allowing the developer to concentrate on better things without worrying about training them.

Below, we discuss what H20 offers as a part of its deep learning framework and features that makes it suitable for data from things.

Configuring and Loading Data in a H20 cluster from R

H20 has provided step -by step instruction manual for setting it up on R. The libraries and instruction manual can be found from H20 installation for R Users.

Below is the code to boot it up in R from a local host. h20.init() provides the method to specify the IP, port and the number of threads to be used by H20. By default, it uses all threads available in that machine.

Training a Deep Learning Model

Once the data is loaded, h20.deeplearning() method with appropriate parameters can be used to invoke the deep learning engine. H20 deep learning package has various methods to preprocess the data itself. However, if we understand and know something about our data, we can very well use R's native packages to preprocess it beforehand.

Below is sample code to invoke H20's deep learning package. Image recognition forms a vital part for IOT. So, we conducted an experiment on the digit recognizer dataset from Kaggle. The goal was to determine a digit is from an image of a handwritten single digit. We ended up with a score of 0.9700. The key lies in selecting the right and efficient set of parameters. We are working on further improving our results. Just to get you through the h20 deep learning package, there are more than 75(seventy five) parameters and except few mandatory parameters, others are at our disposal to choose from. We used some parameters for our experiment that we thought was relevant to us. A detailed description of all available parameters can be found from the link r_h20_deep_learning.

Brief overview of the parameters used:

X and Y: List of the predictors and target variable respectively

training_frame: H2O training frame data

activation: Indicates which activation function to use

hidden: Number of hidden layers and their size

l1: L1 regularization

train_samples_per_iteration: Number of training samples per iteration

classification_stop: Stopping criterion for classification error

epochs: How many times the dataset should be iterated

overwrite_with_best_model: If TRUE, overrides the final model with the best model

standardize: If TRUE, auto standardize the data

distribution: The distribution function of the response. It can be AUTO

missing_values_handling: Ways to handle missing values

stopping_metric: The stopping metric criterion

nfold: Specifying the number of folds for N Fold cross validation

The model can be then applied to a new test dataset to validate its result using the h20.predict package.

Features that stands out for H20

To summarize, deep learning algorithm for IOT:

Should be powerful enough to support distributed computing architectures to handle big data and do computations in parallel.

Should be SMART enough

To preprocess and auto impute the data itself without any external intervention

To support for unsupervised feature learning

Capable enough to prevent model over fitting

Should post process the data itself to give back results in original form or unit of measure

Should cross validate the results itself and decide if result optimization is necessary

Almost all of these capabilities and features are demonstrated by H2O's deep learning package. This makes H2O an ideal predictive analytics engine and is a suitable choice to implement deep learning for Internet of Things. We also found out more about Time Series on H2O and may cover it at some future point.

Author Bios:

Sibanjan Das is a Business Analytics and Data Science consultant. He has a strong consulting experience on Business Systems and Data Analytics. He comes with a background of implementing predictive analytics solutions for business systems and Internet of Things. He is a Master of Business Analytics from Singapore Management University and holds several certification credentials such as OCA, OCP, CSCMS and Six Sigma Green belt.

Ajit Jaokar is the creator of the Data Science for IoT course. This is based on his teaching at Oxford University and UPM (Technical University of Madrid). Ajit’s work covers IoT, Data Science, Smart cities and Telecoms. Ajit is based in London.