Why has this research by Google caused such buzz? First, Google is at the forefront of research in deep learning (which is a type of ML) and much of what they do simply hasn’t been done before. In this case, what they did was to use ML to understand patient risk of negative outcomes by looking at their corresponding attributes in the EMR. Well, what’s so novel about that? Isn’t that what healthcare.ai enables? What’s novel here is that Google combined text and structured data to predict patient risk of several negative outcomes without any manual feature engineering. What does that mean exactly?

Feature engineering

Let’s imagine you’re predicting patient risk of 30-day readmission. If you’re training a model, you have to prepare the independent variables (or patient attributes) by tying these columns to a particular grain and preparing them for the algorithm. In healthcare, the grain is often at the patient encounter (or visit) level. This means that all patient attributes are summarized into one row per-visit. How does that tie to feature engineering? After choosing a particular grain, to standardize the columns you’ll (for example) take an average of a patient’s last five weight measurements or use only their most recent smoking status. This type of work is often done via SQL in a subject-specific data mart that feeds the ML tools (like scikit-learn, healthcare.ai, or TensorFlow).

How does this relate to Google’s work? Essentially, they don’t do any of that column manipulation. Instead, they “take raw EHR data as input, and produce FHIR outputs without manual feature harmonization.” It’s an impressive achievement, especially considering the fact that they also ingest clinician notes and produce respectable model performance. It’s also a great place to focus attention, since ~80% of ML work is feature engineering.

Data details

In this retrospective study, they used 216k hospitalizations to predict inpatient mortality, long length of stay, diagnoses, and 30-day readmissions, which are all of practical interest to health systems. To learn these patterns, they used three architectures: one based on recurrent neural networks (LSTM), one on “an attention-based time-aware neural network model (TANN), and one on a neural network with boosted time-based decision stumps.” For a given prediction the risk is based on an “ensemble of predictions from the three underlying model architectures.” In case those details didn’t make it clear, these are researchers—not SQL-based healthcare analysts.

For simplicity, let’s focus on their readmissions predictions at discharge, which achieved an AUROC (or c-statistic) performance of 0.75 and 0.76 for hospital A and B, respectively. Here hospital A is the University of California, San Francisco and Hospital B is University of Chicago Medicine. To establish a baseline comparison, Google appropriately constructs a logistic regression model specific to these same two hospitals using variables from this paper and finds AUROC scores of 0.70|0.68 (for hospital A|B), which they beat with their deep learning architecture. For context, in our ML field work we’ve found that most old-guard risk models like LACE and those from the EMR vendors (which aren’t tuned to a specific hospital) come in between 0.60-0.70 AUROC. Note that for a perfect model AUROC is 1.0; on the other hand, 0.5 means no predictive power.

Lessons for healthcare?

Interestingly, even though Google Brain is a research organization, they were attempting to provide something that’s broadly useful in healthcare:

[W]e report a generic data processing pipeline that can take raw EHR data as input, and produce FHIR outputs without manual feature harmonization. This makes it relatively easy to deploy our system to a new hospital.

At first blush, it’d be easy for a healthcare CIO to see Google’s headline and think, “cool, this makes it easier for us to leverage machine learning.” But, if they or their data scientists dug deeper they’d quickly hit an insurmountable wall. Let’s say your health system had an extraordinary data scientist that has

Properly identified a relevant business problem that can be improved via risk stratification / decision support

Found a clinical champion to give both feedback and get team buy-in for the new ML-based workflow

The practical ML route

We built healthcare.ai with these facts in mind. It allows data scientists and analysts to quickly train and deploy models in R or Python with minimal time investment by leveraging the experience of data scientists across dozens of healthcare model deployments. Overall, it lets you focus on working with clinicians to a) identify what’s driving your outcome of interest and b) establish buy-in for a new data-driven workflow.

Recently the Bon Secours Charity Health System, a member of the Westchester Medical Center Health Network, found that retrospective analyses and point-based risk scores weren’t enough to help lower their readmissions rate. Bon Secours Charity, which is located in the Hudson Valley region of New York State, determined that optimizing readmissions interventions would be best served by a machine learning model, which would on a daily basis help determine who of their general population was likely to be readmitted within 30-days of an inpatient discharge. They engaged Health Catalyst. Taylor Larsen, from the data science team, installed pre-built data pipelines using standardized clinical/predictive data marts and trained the model that enabled this critical piece of decision support.

Why do I bring this up? Because it provides a helpful contrast to the Google paper discussed above. Here are some details for readmission prediction at discharge:

Bon Secours Charity HS

Google A (UCSF)

Google B (UCM)

Training rows

54k hospitalizations

86k hospitalizations

109k hospitalizations

Algorithm

Random forest

LSTM/TANN/Boosting

LSTM/TANN/Boosting

AUROC

0.80

0.75

0.76

Setup/train time

10 hrs

>20k GPU hrs

>20k GPU hrs

In production

Yes

No

No

The underlying raw EHR data is likely fairly similar in the Bon Secours and Google projects—what’s different is Google took the deep learning route whereas Bon Secours used standard clinical data marts and more practical ML. The quick results seen here at Bon Secours are similar to decision support engagements at other health systems, and can be obtained for any system using DOS from Health Catalyst.

Also, it’s important to keep in mind that quickly standing up an accurate model is just one step in operationalizing this type of decision support. Bon Secours showed that partnership, transparency, and careful thought must be demonstrated in order to gain the trust and adoption of end-users that will carry out the interventions.

Healthcare desperately needs Google and its talent, engineering, ML, and user-centric focus, and we’re thrilled that Google is getting involved. The average health system is riddled with antiquated software, manual processes ripe for automation, and typically zero ML-based clinical decision support.

For the sake of both patients and clinicians, we hope that Google focuses on the average health system and works towards practical solutions that reduce medical errors, lower the burden on physicians, and gives the patient the customer-focused experience they deserve.

Related posts in Machine Learning

Our team is often asked why machine learning (ML) isn’t more prevalent in healthcare. In this first post of a series on barriers to healthcare ML, we discuss one of the biggest hurdles, data reliability. Most CTOs in the health space are excited about using ML throughout their healthcare systems if they aren’t already. We hear stories of neural nets outperforming seasoned clinicians in prominent research studies, such as predicting mortality based solely off of when lab tests were ordered, or predicting patient characteristics from retinal images. That…

We are thrilled to announce the release of version 2.0 of our R package, healthcare.ai. The goal of the software is to make it as easy and fast as possible to put machine learning models to work for health systems. We overhauled the code for this release to make the package even easier to use, to automatically avert problems that commonly arise in machine learning deployments, and to boost models’ predictive power. This post describes how the package does that, but if you’re more of a hands-on type,…

­­­­We started healthcare.ai in late 2016 to bring machine learning (ML) to the healthcare masses. As we release version 2.0 of the software (on April 20th), it’s worth stepping back to fully understand why we invest in this open-source project, which is freely available to all. Why would a for-profit firm spend time investing in this public good? Since the 2009 HITECH act incentivized EHR adoption, data has become much more ubiquitous in healthcare. Despite all that’s gone wrong in US healthcare, the fact that healthcare data is…

Many vendors deliver machine learning models with different applications in healthcare. But they don’t all deliver accurate models that are easy to implement, targeted to a specific use case, connected to actionable interventions, and surrounded by a machine learning community and support team with extensive, exclusive healthcare experience.
These machine learning qualities are possible only through a machine learning model delivered by a vendor with a unique set of capabilities. There are five differentiators behind effective machine learning models and vendors:

Vendor’s expertise and exclusive focus on healthcare.

Machine learning model’s access to extensive data sources.

Machine learning model’s ease of implementation.

Machine learning model’s interpretability and buy-in.

Machine learning model’s conformance with privacy standards.

These five factors separate the high-value vendors and models from the crowd, so healthcare systems can quickly implement machine learning and start seeing improvement results.

Subscribe and get updates delivered to your email.

This project was started by and receives ongoing support from Health Catalyst.