AI Helps Scientists Discover Almost 6,000 New Viruses

Researchers have used powerful machine learning algorithms to classify viruses at a quicker and more impressive rate than traditional methods. This approach could have important consequences for both human health and industrial applications.

As reported in Nature, the work was presented on March 15 at a meeting organized by the US Department of Energy's Joint Genome Institute. Simon Roux, who works at the Institute, presented the team's work on inoviruses.

Inoviruses are an important family of virus that infect bacteria, and while they don’t harm us directly, they can still be a health threat. For example, the cholera bacterium (Vibrio cholerae) can be made more toxic by inoviruses, which can alter the behavior of their hosts.

Machine learning is when algorithms can be taught to look for patterns within data and learn from it. So by training the machines to recognize specific patterns of genetic material, the team were eventually able to get the AI to classify potential inoviruses autonomously.

Roux's training approach was two-fold: First, his team gave the algorithm 805 genomic sequences belonging to inoviruses known to science. Then, they fed the software 2,000 sequences belonging to either other viruses or bacteria. That allowed the software to pick out only those from the Inoviridae family.

The trained software was then used to analyze massive sets of genomic data. In doing so, it found more than 10,000 inoviruses, which were then divided into respective species.

Before Roux started the study, fewer than 100 species of inoviruses had been discovered. Now with the software, he was able to find nearly 6,000 unknown virus species. Considering such variety, Roux now thinks that the Inoviridae family is actually multiple families.

This was not the only study presented at the meeting that employed machine learning. Nature reports that Deyvid Amgarten, from the University of São Paulo in Brazil, used trained software to identify viruses in zoo compost piles in São Paulo. His goal is to understand what role they play in bacteria and if they can be used to improve how quickly organic matter breaks down.

Amgarten's work used the software VirFinder developed by Jie Ren and his team last year. Ren is using it to work out what part viruses might play in diseases that are not “viral”. For example, they showed that people with cirrhosis, a liver condition, have different viruses compared to healthy people.

Understanding viruses is an endeavor of epic proportions, but with machine learning we might achieve many more pieces of the puzzle.