Follow the author of this article

Follow the topics within this article

Machine learning algorithms could predict the natural host of a virus and help prevent the spread of disease to humans, scientists have found.

It is common for deadly and newly emerging viruses to circulate in wild animals and insect communities long before they spread to people. Most of the major pandemics of the last 100 years began in animal hosts, including the Spanish flu which jumped from birds to humans and HIV, which came from chimps.

These delays make it difficult to implement preventative measures, such as vaccinating animals or preventing contact between species.

But scientists say that a new machine learning algorithm could accelerate this process.

In a study published in Science on Thursday, an algorithmic model accurately predicted the likely host for a broad spectrum of single-source RNA viruses - the viral group which jumps from animals to humans most often.

“To stop an outbreak you need to know where the virus might have come from,” said Dr Daniel Streicker, senior author of the study from the MRC-University of Glasgow Centre for Virus Research.

“This [model] is helpful as it allows us to get to that answer a little faster. It gives us some really early indications of where we should be looking which can immediately inform research.”

The researchers were able to train machine learning algorithms to match the patterns within viral genomes to the animal they came from. When a virus jumps from a natural host to a human, it leaves a fingerprint from the animal or insect which can be traced back.

Over 500 viruses were studied, and in over 70 per cent of cases the model correctly predicted the known natural host of the disease. In 97 per cent of cases, the tool determined whether or not the virus was transmitted by blood-sucking insects, and the type of insect was predicted correctly about 90 per cent of the time.

“The work started 10 years ago, when virus’ were experimentally deoptimised and there was some indication that a virus could be matched to the genome of the host,” said Dr Steicker. “But we were pretty surprised that we were able to dig so deep about where a virus comes from.”

The team also applied the models to viruses which have an unknown natural host - such as Zika, Crimean Congo Hemorrhagic Fever and MERS - and the algorithms confirmed the current ‘best guess’ animal host in each field.

The model also revealed new leads. Ebola is thought to have come from bats, but when the team tested two of the four strains of the disease, they found strong evidence that these strains of the virus could also have originated in primates.

“Two Ebola virus’ are not well studied, but since the others came from bats it was presumed they also did,” said Dr Steicker. “The model doesn’t completely discount this, but it provides helpful new evidence.”

The team are developing a web application to allow scientists across the globe to submit their virus sequences and recieve predictions for natural hosts and disease transmission routes within minutes.

But their work will not render old-school virus hunters, such as the Predict team currently working in Sierra Leone, obsolete. The current model is unlikely to find the as-yet-undiscovered ‘Disease X’, which the World Health Organization has listed as a pathogen with the potential to spark an epidemic.