Menu

H2O is an open-source, distributed and scalable machine learning platform written in JAVA. H2O supports many of the statistical & machine learning algorithms, including gradient boosted machines, generalized linear models, deep learning and more. OpenFaaS®(Functions as a Service) is a framework for building Serverless functions easily with Docker. Read my previous post to learn more about OpenFaaS and DO.

H2O has a module aptly named sparkling water that allows users to combine the machine learning algorithms of H2O with the capabilities of Spark. Integrating these two open-source environments provides a seamless experience for users who want to make a query using Spark SQL, feed the results into H2O to build a model and make predictions, and then use the results again in Spark. For any given problem, better interoperability between tools provides a better experience.

H2O Driverless AI is a commercial package for automatic machine learning that automates some of the most difficult data science and machine learning workflows such as feature engineering, model validation, model tuning, model selection, and model deployment. H2O also has a popular open-source module called AutoML that automates the process of training a large selection of candidate models. H2O’s AutoML can be used for automating the machine learning workflow, which includes automatic training and tuning of many models within a user-specified time-limit. AutoML makes hyperparameter tuning accessible to everyone.

H2O allows you to convert the models to either a Plain Old Java Object (POJO) or a Model Object or an Optimized (MOJO) that can be easily embeddable in any Java environment. The only compilation and runtime dependency for a generated model is the h2o-genmodel.jar file produced as the build output of these packages. You can read more about deploying h2o models here.

I have created an OpenFaaS template for deploying the exported MOJO file using a base java container and the dependencies defined in the gradle build file. Using the OpenFaaS CLI (How to Install) pull my template as below:

Copy the exported MOJO zip file to the root folder along with build.gradle and settings.gradle. Make appropriate changes to handle.java as per the needs of the model, as explained here. Add http://digitaloceanIP:8080 to watersplash.yml

1

2

3

provider:

name:openfaas

gateway:http://digitaloceanIP:8080

and finally:

1

faas-cli up-fwatersplash.yml

That’s it! Congratulations! Your model is up and running! Access it at http://digitaloceanIP:8080/function/watersplash

Artificial intelligence (AI) and Machine Learning (ML) are having a profound impact on the way medicine is being practiced. AI/ML algorithms and techniques fit imaging applications easily and can help with automation. Radiology is the specialty that has benefitted the most from the AI/ML revolution. Melanoma detection in Dermatology is another obvious winner.

Image credit: pixabay.com

Many of the machine learning algorithms are reasonably well known. The real challenge is to get the infrastructure to crunch massive amounts of data, getting the ideal dataset for a problem, optimizing the model for performance and deploying the model for use. If you are relatively new to ML, Kaggle is a useful resource for you to start.

I will briefly introduce Kaggle for those who have not used it before. Kaggle is a platform for posting datasets that you have collected. They also provide ‘kernels’ or computational resources (typically Jupyter Notebooks) for collaborative analysis. The datasets can be made private or public under a variety of license options. Organizations post competitions and reward teams that solve them. Solutions are typically posted as predictions on a test dataset or share the kernel code

I recently noticed a good competition on Kaggle that the eHealth community may find interesting. Aravind Eye Hospital in India has posted a dataset consisting of fundoscopic images of diabetic retinopathy with varying degrees of severity. The dataset consists of thousands of images collected in rural areas by the technicians of Aravind hospital from the rural areas of India. The challenge is to develop a model that can predict the severity of diabetic retinopathy from the fundoscopic image. Further, the successful solutions will be shared with other Ophthalmologists through the 4th Asia Pacific Tele-Ophthalmology Society (APTOS) Symposium.

Serverless is the new kid on the block with services such as AWS Lambda, Google Cloud Functions or Microsoft Azure Functions. Essentially it lets users deploy a function (Function As A Service or FaaS) on the cloud with very little effort. Requirements such as security, privacy, scaling, and availability are taken care of by the framework itself. As healthcare slowly yet steadily progress towards machine learning and AI, serverless is sure to make a significant impact on Health IT. Here I will explain serverless (and some related technologies) for the semi-technical clinicians and put forward some architectural best practices for using serverless in healthcare with FHIR as the data interchange format.

Serverless on FHIR

Let us say, your analyst creates a neural network model based on a few million patient records that can predict the risk for MI from BP, blood sugar, and exercise. Let us call this model r = f(bp, bs, e). The model is so good that you want to use it on a regular basis on your patients and better still, you want to share it with your colleagues. So you contact your IT team to make this happen.

This is what your IT guys currently do: First, they create a web application that can take bp, bs and e as inputs using a standard interface such as REST and return r. Next, they rent a virtual machine (VM) from a cloud provider (such as DigitalOcean). Then they convert this application into a container (docker) and deploy it in the VM. You now can use this as an application from your browser (chrome) or your EMR (such as OpenMRS or OSCAR) can directly access this function. You can share it with your colleagues and they can access it in their browsers and you are happy. The VM can support up to 3 users at a time.

In a couple of months, your algorithm becomes so popular that at any one time hundreds of users try to access it and your poor VM crashes most of the time or your users have to wait forever. So you call your IT guys again for a solution. They make 100 copies of your container, but your hospital is reluctant to give you the additional funding required.

Your smart resident notices that your application is being used only in the morning hours and in the night all the 100 containers are virtually sleeping. This is not a good use of the funding dollars. You contact your IT guys again, and they set up Kubernetes for orchestrating the containers according to usage. So, what is Serverless? Serverless is a framework that makes all these so easy that you may not even need your IT guys to do this. (Well, maybe that is an exaggeration)

My personal favourite serverless toolset (if you care) is Kubernetes + Knative + riff. I don’t try to explain what the last two are or how to use them. They are so new that they keep changing every day. In essence, your IT team can complete all the above tasks with few commands typed on the command line on the cloud provider of your choice. The application (function rather) can even scale to zero! (You don’t pay anything when nobody uses it and add more containers as users increase, scaling down in the night as in your case).

Best Practices

What are the best practices when you design such useful cloud-based ‘functions’ for healthcare that can be shared by multiple users and organizations? Well, here are my two cents!

First, you need a standard for data exchange. As JSON is the data format for most APIs, FHIR wins hands down here.

Next, APIs need a mechanism to expose their capabilities and properties to the world. For example, r = f(bp, bs, e) needs to tell everyone what it accepts (bp, bs, e) and what it returns (at the bare minimum). FHIR has a resource specifically for this that has been (not so creatively) named as an Endpoint. So, a function endpoint should return a FHIR Endpoint resource with information about itself if there is no payload.

What should the payload be? Payload should be a FHIR Bundle that has all the FHIR Resources that the function needs (bp, bs and e as FHIR Observations in your case). The bundle should also include a FHIR Subscription resource that points to the receiving system (maybe your EMR) for the response ( r ).

So, what next?

Take the phone and call your IT team. Tell them to take Kubernetes + Knative + riff for a spin! I might do the same and if I do, I will share it here. And last but not the least, click on the blue buttons below! 🙂

Health data warehousing is becoming an important requirement for deriving knowledge from the vast amount of health data that healthcare organizations collect. A data warehouse is vital for collaborative and predictive analytics. The first step in designing a data warehouse is to decide on a suitable data model. This is followed by the extract-transform-load (ETL) process that converts source data to the new data model amenable for analytics.

The OHDSI – OMOP Common Data Model is one such data model that allows for the systematic analysis of disparate observational databases and EMRs. The data from diverse systems needs to be extracted, transformed and loaded on to a CDM database. Once a database has been converted to the OMOP CDM, evidence can be generated using standardized analytics tools that are already available.

Each data source requires customized ETL tools for this conversion from the source data to CDM. The OHDSI ecosystem has made some tools available for helping the ETL process such as the White Rabbit and the Rabbit In a Hat. However, health data warehousing process is still challenging because of the variability of source databases in terms of structure and implementations.

Hephestus is an open-source python tool for this ETL process organized into modules to allow code reuse between various ETL tools for open-source EMR systems and data sources. Hephestus uses SqlAlchemy for database connection and automapping tables to classes and bonobo for managing ETL. The ultimate aim is to develop a tool that can translate the report from the OHDSI tools into an ETL script with minimal intervention. This is a good python starter project for eHealth geeks.

Anyone anywhere in the world can build their own environment that can store patient-level observational health data, convert their data to OHDSI’s open community data standards (including the OMOP Common Data Model), run open-source analytics using the OHDSI toolkit, and collaborate in OHDSI research studies that advance our shared mission toward reliable evidence generation. Join the journey! here

Disclaimer: Hephestus is just my experiment and is not a part of the official OHDSI toolset.

Visual Analogue Scale (VAS) is a commonly used tool for measuring subjective sensations such as itching. There is evidence showing that visual analogue scales have superior metrical characteristics than discrete scales, thus a wider range of statistical methods can be applied to the measurements.

Couple of months back, I attended a thesis defense in McMaster in which an innovative web based tool called Pain-QuILTTM for visual self-report of pain was presented. The technique of iconography – pictorial material relating to or illustrating a subject – was employed to represent pain using a flash based web-interface. Pain-QuILTTM tracks quality, intensity, location and temporal characteristics of the pain. Quality is represented by different icons, intensity is represented by a visual analogue scale of 1 -10, location by the position of icon on the body image and temporal characteristics by the time stamp. The clinical feasibility of Pain-QuILTTM has been successfully validated and published (1).Pain-QuILTTM is a property of McMaster University and is subject to McMaster University’s terms of use. It can be accessed here.

The iconographic symptom encoding could be applied easily to dermatology as well. Dermatology lesions are primarily visual and dermatological diagnosis to a great extend is based on the type, distribution, intensity and temporal characteristics of the skin lesions. However the representation may be challenging because of the diverse nature of lesions.

Recently I came across fabric.js a javascript library for image manipulation based on HTML5 canvas. Fabric.js was much more versatile and powerful than I expected. I could prototype LesionMapperTM (that is what I want to call it), in less than 24 hours. The type of lesions are symbolized by representative clinical pictures instead of icons, intensity is represented by the opacity/translucency of the image and the location and distribution by the position and size of the lesion respectively, on the body image. The images can be dragged, enlarged or rotated. The icing on the cake is the ability of fabric.js to rasterize the image into a JSON that can be stored easily in a database.

Update: 14- June – 2016

LesionMapper is available as an OpenMRS module and OSCAR eForm. OpenMRS module is opensource and can be downloaded here. The github repository is available here. If you need the OSCAR eForm version, please contact me.