Application of machine learning in predicting early diagnosis of rheumatic diseases

Introduction

Machine learning is a technology much older than the most recent and famous successes associated with it. The fundamental principles at the heart of the architecture of the most powerful artificial neural networks known today were already being used in the late 1960s. The performance of this technology relies heavily on the volume and quality of data available to perform the learning. The volume of data of all types has shown very strong growth during the past two decades, including in medicine, and many attempts have been made to use the learning resources represented by these data to train artificial intelligence (AI) to diagnose more or less complex pathologies through machine learning strategies. Among these pathologies, rheumatic diseases occupy a special place, because they represent pathologies for which patients are regularly monitored, and can be described by very different data, such as X-ray images, flow cytometry profiles, and activity scores.

Machine learning and disease diagnosis and classification

The most obvious use of AI in a medical setting is certainly diagnostic assistance. In the context of machine learning, this means that a set of data – consisting of a cohort of patients for whom a collection of clinical or biological parameters and their diagnoses are available – is used to train a program. The program is trained to recognize, within the parameters that characterize the patient in the dataset, a more or less complex combination of variables on which to rely in order to predict the correct diagnosis. In the case of clinical research, such approaches are used to identify new biological signatures and thus highlight potential biomarkers of a pathology. This approach is based on the resolution of classification problems; the trained programs are called “classifiers” and they are trained to classify patients according to their diagnosis.

Using Bibliography BOT (BIBOT), a software written in Python 2.7 language built to automatically identify and interpret important words in large numbers of abstracts,for a quick review of the literature, we have identified articles about the use of machine learning approaches to predict the diagnosis of rheumatic diseases (Figure 1). There has been an explosion in the number of articles referring to the automatic learning of the diagnosis of rheumatic diseases and since 2011, at least two articles are published every year on this topic.

Figure 1: Evolution of the number of published articles mentioning approaches of machine learning used for the early diagnosis of rheumatic diseases (results obtained with the BIBOT software)

Mention is also made of the use of machine learning to diagnose rheumatoid arthritis. The approaches developed to diagnose this pathology also rely on image processing, with the detection of synovial regions and their gradation. The most common machine learning approach to this kind of image processing is the use of a particular neuron network architecture: convolutional neuron networks.

Advertisement

Limitations of machine learning in disease diagnosis

The use of machine learning in the biomedical context, however, has limitations. These are as follows.

The performance of a machine learning program is very dependent on the dataset on which it was trained. If it contains too few examples in view of the complexity of the phenomenon studied, there is a risk of over-fitting.

If the dataset contains skewed observations, measurement errors, or too many artifacts, learning will also result in an unsatisfactory solution.

There is a major problem in the context of biomedical decision-making. The steps of the reasoning of the AI must be clearly exposed to the clinician, and in an intelligible way, but it is common for these algorithms to include “black boxes”; particularly complex information processing steps that make very little sense to a human. In other words the algorithm can sometimes learn complex rules which gives it great precision, but whose meaning is very hard to understand, even for a domain expert.

Conclusion

As we enter the era of “Big Data”, the use of machine learning is becoming more widespread and the performances achieved by this approach are becoming more and more spectacular. Medicine is not immune to this trend and many applications of these machine learning technologies are used in this field, including diagnosis prediction. The performance of these techniques depends largely on the volume of data available. The large amounts of information available in rheumatology explain the interest in these technologies within this field.

The use of machine learning currently has limitations due to the "black box" phenomenon and its high sensitivity to the quality of the training dataset. In the near future, we can hope to see new approaches to reduce the dependence of the performance of these programs on the quality of training data, for example by the use of knowledge external to these datasets via natural language processing approaches.

About the authors

Nathan Foulquier is a PhD student in artificial intelligence at University of Western Brittany, France, and member of the Laboratory of Medical Information Processing (LATIM, University of Brest), France. Disclosures

Alain Saraux is Professor of Rheumatology at the Brest University Medical School in France, head of the Rheumatology Department of the University Hospital of Brest, and a member of the laboratory of immunotherapy and B cells pathologies of Brest. Disclosures