Why we made this change

Visitors are allowed 3 free articles per month (without a subscription), and private browsing prevents us from counting how many stories you've read. We hope you understand, and consider subscribing for unlimited online access.

Google Has Released an AI Tool That Makes Sense of Your Genome

AI tools could help us turn information gleaned from genetic sequencing into life-saving therapies.

Almost 15 years after scientists first sequenced the human genome, making sense of the enormous amount of data that encodes human life remains a formidable challenge. But it is also precisely the sort of problem that machine learning excels at.

On Monday, Google released a tool called DeepVariant that uses the latest AI techniques to build a more accurate picture of a person’s genome from sequencing data.

DeepVariant helps turn high-throughput sequencing readouts into a picture of a full genome. It automatically identifies small insertion and deletion mutations and single-base-pair mutations in sequencing data.

High-throughput sequencing became widely available in the 2000s and has made genome sequencing more accessible. But the data produced using such systems has offered only a limited, error-prone snapshot of a full genome. It is typically challenging for scientists to distinguish small mutations from random errors generated during the sequencing process, especially in repetitive portions of a genome. These mutations may be directly relevant to diseases such as cancer.

A number of tools exist for interpreting these readouts, including GATK, VarDict, and FreeBayes. However, these software programs typically use simpler statistical and machine-learning approaches to identifying mutations by attempting to rule out read errors.

“One of the challenges is in difficult parts of the genome, where each of the [tools] has strengths and weaknesses,” says Brad Chapman, a research scientist at Harvard’s School of Public Health who tested an early version of DeepVariant. “These difficult regions are increasingly important for clinical sequencing, and it’s important to have multiple methods.”

DeepVariant was developed by researchers from the Google Brain team, a group that focuses on developing and applying AI techniques, and Verily, another Alphabet subsidiary that is focused on the life sciences.

The team collected millions of high-throughput reads and fully sequenced genomes from the Genome in a Bottle (GIAB) project, a public-private effort to promote genomic sequencing tools and techniques. They fed the data to a deep-learning system and painstakingly tweaked the parameters of the model until it learned to interpret sequenced data with a high level of accuracy.

Last year, DeepVariant won first place in the PrecisionFDA Truth Challenge, a contest run by the FDA to promote more accurate genetic sequencing.

“The success of DeepVariant is important because it demonstrates that in genomics, deep learning can be used to automatically train systems that perform better than complicated hand-engineered systems,” says Brendan Frey, CEO of Deep Genomics.

The release of DeepVariant is the latest sign that machine learning may be poised to boost progress in genomics.

Deep Genomics aims to develop drugs by using deep learning to find patterns in genomic and medical data.

Frey says AI will eventually go well beyond helping to sequence genomic data. “The gap that is currently blocking medicine right now is in our inability to accurately map genetic variants to disease mechanisms and to use that knowledge to rapidly identify life-saving therapies,” he says.

Another prominent company in this area is Wuxi Nextcode, which has offices in Shanghai, Reykjavik, and Cambridge, Massachusetts. Wuxi Nextcode has amassed the world’s largest collection of fully sequenced human genomes, and the company is investing heavily in machine-learning methods.

DeepVariant will also be available on the Google Cloud Platform. Google and its competitors are furiously adding machine-learning features to their cloud platforms in an effort to lure anyone who might want to tap into the latest AI techniques (see “Ambient AI Is About to Devour the Software Industry”).

But genomic medicine represents an especially big opportunity, because the scale and complexity of the data is unprecedented. “For the first time in history, our ability to measure our biology, and even to act on it, has far surpassed our ability to understand it,” says Frey. “The only technology we have for interpreting and acting on these vast amounts of data is AI. That’s going to completely change the future of medicine.”

Share

Tagged

Credit

I am the senior editor for AI at MIT Technology Review. I mainly cover machine intelligence, robots, and automation, but I’m interested in most aspects of computing. I grew up in south London, and I wrote my first line of code (a spell-binding… More infinite loop) on a mighty Sinclair ZX Spectrum. Before joining this publication, I worked as the online editor at New Scientist magazine. If you’d like to get in touch, please send an e-mail to will.knight@technologyreview.com.

You've read
of three
free articles this month.
Subscribe now for unlimited online access.
You've read
of three
free articles this month.
Subscribe now for unlimited online access.
This is your last free article this month.
Subscribe now for unlimited online access.
You've read all your free articles this month.
Subscribe now for unlimited online access.
You've read
of three
free articles this month.
Log in for more, or subscribe now for unlimited online access.
Log in for two more free articles, or subscribe now
for unlimited online access.