This copy is for your personal non-commercial use only. To order presentation-ready copies of Toronto Star content for distribution to colleagues, clients or customers, or inquire about permissions/licensing, please go to: www.TorontoStarReprints.com

Sequencing of the human genome in 2003 was a monumental achievement. But it left us with more questions than answers: it gave scientists the three-billion-base-pair instruction manual for how a person is created, but not the knowledge of how to read it.

Now a research team led by Brendan Frey at the University of Toronto has created a sophisticated computer tool that uses machine learning — and hardware borrowed from the video-game industry — to peer into parts of the genome that were once “black boxes,” and to rank how likely variants in those regions are to give rise to diseases, including autism.

“We’ve increased by a factor of 10 how much of the genome we can analyze and understand,” says Frey, the Canada Research Chair in Biological Computation and a senior fellow of the Canadian Institute for Advanced Research.

The research, published online Thursday in the journal Science, is “a big deal,” says Jeremy Sanford, a professor at the University of California Santa Cruz who specializes in RNA biology. “This is a good step toward interpreting the less obvious features of the genome.”

In order to create the computer tool — recently dubbed “SPANR,” for SPlicing-based ANalysis of vaRiants” — the research team first acquired sophisticated graphics cards developed by video-game companies. Scientists have realized they are perfectly suited for deep learning, the type of high-level machine learning Frey’s lab wanted to undertake.

Article Continued Below

“We’ve taken these video-game cards that were causing teenagers to not do any work, and solved one of the hardest problems in science,” Frey jokes.

Teaching the computer how to read the genome is like teaching a child how to read words, Frey says. The child sees the word “cow” and a picture of a cow. Eventually, the child learns that those three letters in that order correspond with the picture of the animal. As the child learns to read, it recognizes the word “cow” in new contexts.

Frey’s team showed the computer system strings of DNA, and showed it how much protein those strings of DNA produce. By examining tens of thousands of such examples, the machine is eventually able to predict which proteins will be made for a given DNA sequence, including ones that differ between individuals. What the scientists were really interested in was the regulatory code, parts of genes that provide the instructions for stitching proteins together, a process called splicing.

Only 1 per cent of the genome makes proteins, while around 10 per cent is regulatory code. But while scientists know that splicing errors are associated with disease, interpreting those genetic regions has been impossible. The idea behind SPANR was to create a system smart enough to infer what is happening with that regulatory code.

To verify whether the computer system worked, the team showed the computer 100,000 genetic mutations — changes in the text of the DNA — that give rise to disease and 550,000 genetic mutations that are seemingly harmless. Sure enough, the system scored the disease-causing mutations higher than the harmless ones, setting it up to predict mutations in previously unreadable regulatory code.

Working with Stephen Scherer, director of the Centre for Applied Genomics at the Hospital for Sick Children and a co-author on the Science paper, the researchers analyzed five genomes from people with autism spectrum disorder and 12 controls. Autism is a puzzle: mutations in several dozen genes have been linked to the disorder, but only account for around 20 per cent of cases.

When SPANR ranked genetic variants according to which would affect splicing, it found 39 new genes that may have a role in the disorder.

“I was surprised that it works so well, for kind of a first attempt,”

said Scherer. “There’s a lot we can learn from it.”

Frey predicts the system will have wide application in clinical settings, and will contribute to the development of personalized medicine, the concept of treating illness based on a patient’s particular molecular makeup.

More from the Toronto Star & Partners

LOADING

Copyright owned or licensed by Toronto Star Newspapers Limited. All rights reserved. Republication or distribution of this content is expressly prohibited without the prior written consent of Toronto Star Newspapers Limited and/or its licensors. To order copies of Toronto Star articles, please go to: www.TorontoStarReprints.com