Deep Learning for Internet of Things Using H2O

H2O is feature-rich open source machine learning platform known for its R and Spark integration and it’s ease of use. This is an overview of using H2O deep learning for data science with the Internet of Things.

H2O is an Open Source machine learning platform for smarter applications. At the Data Science for IoT course, we have been following H2O for features such as Open Source, R integration, Spark integration, Deep Learning and it’s ease of use.

This blog is authored by Sibanjan Das and Ajit Jaokar as part of our work at the Data Science for IoT course exploring H2O Deep Learning for Internet of Things.

Analytics are at the core of IoT. To implement IoT analytics, we need to apply Machine Learning Algorithms to IoT datasets. However, the methodology for machine learning implementations is different from traditional techniques. Several factors and constraints apply when we consider Data Science for IoT implementations. These include:

Many features of H2O provide the Smart capabilities for IoT Analytics – for instance In-Memory processing, the Deep learning Package and the Spark streaming package (Sparking Water). The Deep learning package from h20 features automatic data handling techniques to standardize data, handle missing values and categorical data conversions. It also includes automatic performance tuning, load balancing and it takes out the overhead of complicated configuration files. Sparkling water for Spark enables easier ingestion of data from real time sources like Apache Spark to deep learning algorithms. Also, it’s in memory compression techniques it can handle huge amount of data even in a small cluster. This makes it a suitable choice for architecting analytics for Internet of Things.

First, we look into the deep learning capabilities of H2O. This will provide a backdrop for the use of H2O for building IOT capabilities.

Deep learning algorithms play an important role in IOT analytics. Data from machines is sparse and/or has a temporal element in it. Even when we trust data from a specific device, devices may behave differently at different conditions. Hence, capturing all scenarios for data preprocessing/training stage of an algorithm is difficult. Monitoring sensor data continuously is also cumbersome and expensive. Deep learning algorithms can help to mitigate these risks. Deep Learning algorithms learn on their own allowing the developer to concentrate on better things without worrying about training them.