Despite being touted as next-generation cure-alls that will transform healthcare in unfathomable ways, artificial intelligence and machine learning still pose many concerns with regards to safety and responsible implementation.

BMJ Quality and Safety has published a new study that identifies short-, medium- and long-term issues that machine learning will encounter in the healthcare space – hurdles that could prevent its successful implementation in a wide are of use cases.

With everything at stake, from research and clinical guidance to direct control of critical patient safety equipment (although that is still in the future), these strata of concerns suggest that there are many challenges AI and machine learning applications will need to address as they become more ubiquitous in healthcare.

WHY IT MATTERS
Those applications are oftten hamstrung by the same problems almost every computing task is: the computer does exactly what is told, which can invite or exacerbate unintended consequences. BMJ surveyed various applications that are currently in use, as well as those on the near horizon and beyond.

Short Term. Machine Learning is only as good as the data it gets. A phenomena known as "distributional shift" can occur, where training data and real-world data are different leading and algorithm to draw the wrong conclusions. machine learning also doesn't have the same ability to weigh the costs and consequences of false positives or negatives the way a doctor would: they can't "err on the side of caution" like a human.

This can be especially problematic since machine learning apps usually run as a "black box" where the machinations of its decision-making aren't open to inspection. If a clinician can only judge a prediction based on a system's final outcome, it may either undermine the human opinion or simply prove worthless.

Finally, machine learning algorithms, especially those in the black box category, need some way to assess their own confidence in their predictions. Without attaching some degree of certainty, the machine learning application lacks a necessary "fail-safe."

Medium Term. As machine learning becomes more commonplace, clinicians and those who interact with machine learning are at risk of becoming complacent and treating all computer-generated assessments as "infallible." Trusting a program becomes even more dangerous over time as the training dataset gets older and clashes with the inevitable reality in medicine of changing practice, medications available, and changes in disease characteristics over time.

In this way, machine learning can even influence medical research: it can make "self-fulfilling" predictions that may not be the best course of action but over time will reinforce its decision making process.

Long Term. Although AI control of processes or equipment that directly relates to human life (insulin pumps, ventilators, etc.) is a long way off, researchers trying to find applications for these technologies must tread lightly. machine learning algorithms are trained on fairly narrow datasets and unlike humans are unable to take into account the wider context of a patient's needs or treatment outcomes.

They can "game the system," and learn to deliver results that appear successful in the short term but run against longer term goals. As they learn, there are ethical and safety questions about how much "exploration" an machine learning system can undertake: a continuously learning autonomous system will eventually experiment with pushing the boundaries of treatments in an effort to discover new strategies, potentially harming patients.

All of this invites the very problem that AI and machine learning supposed to address- increased direct human oversight.

The potential for real improvement is clear, as some machine learning systems are already vastly better than their human expert counterparts at reviewing imaging reports or synthesizing lengthy patient records to develop a more precise care plan. BMJ acknowledges these developments but warns about the myriad of unforseen consequences of trusting machine learning too blindly or too quickly.

Instead, researchers must proceed with a sharp eye towards how a computer handles data and learning versus a human, as well as the ethical and safety implications of the new world that they are helping to forge.

ON THE RECORD
"Developing AI in health through the application of ML is a fertile area of research, but the rapid pace of change, diversity of different techniques and multiplicity of tuning parameters make it difficult to get a clear picture of how accurate these systems might be in clinical practice or how reproducible they are in different clinical contexts," wrote the authors of the report.

"This is compounded by a lack of consensus about how ML studies should report potential bias, for which the authors believe the Standards for Reporting of Diagnostic Accuracy initiative could be a useful starting point," they added. "Researchers need also to consider how ML models, like scientific data sets, can be licensed and distributed to facilitate reproduction of research results in different settings."

Benjamin Harris is a Maine-based freelance writer and and former new media producer for HIMSS Media.
Twitter: @BenzoHarris.