Assembling the Pieces of the Genomic Puzzle

Lower costs and faster sequencing speed are opening up the potential of whole genome sequencing (WGS) platforms to make routine, rapid use of genomic data a reality in medicine. This goal of properly personalized medicine will require informatics techniques that can make best use of WGS data. To tackle this problem, OmniTier Inc.’s new bioinformatics platform, CompStorTM Novos, takes a de novo assembly-based variant calling approach, which they hope to detail in a publication in 2019. We caught up with Jon Coker, Omnitier’s CTO, to discuss how this approach matches up to rival techniques, and how high-performance computing approaches like Omnitier’s hope to bring personalized medicine a step closer and boost biology in coming years.

Ruairi Mackenzie (RM): What exactly is a WGS assembly approach?

Jon Coker (JC): DNA assembly technology is like solving a jigsaw puzzle without prior knowledge of the completed picture; the final picture emerges purely by matching the local features of each piece to other pieces. Try this on a real puzzle, and you’ll find that the puzzle can still be solved, but also that it takes a lot longer to do so. Likewise, DNA assembly technology can be more computationally intensive than alternatives. Whole genome sequencing (WGS) simply means that the puzzle involves the entire human genome. Other commonly used techniques like whole exome sequencing analyze only a small fraction of the whole genome, and the puzzle is much smaller. As a result, WGS assembly is generally perceived as representing a double whammy with respect to required computation resources and costs.

RM: Why is variant calling an important problem in personalized medicine?

JC: More than 99% of your DNA is identical to that of all other humans. Variation in the small remainder is what makes you, you - including your need for and reactions to pharmaceuticals, predisposition to disease, and other medically important phenomena. Not surprisingly, it’s a hot area of research to accurately identify those variations, and then to match that personalized information with the most effective personalized treatment plan. WGS assembly addresses that first, identification step.

RM: How does CompStorTM Novos differ from other WGS techniques?

JC: Alignment-based variant calling is widely used today. To revisit the puzzle analogy, alignment technology is like solving a jigsaw puzzle when you do indeed have prior knowledge of the completed picture, the reference genome, to guide you. The puzzle is solved, piece by piece, by matching each puzzle piece to the reference. This method works well most of the time, but perhaps you can see the problem: the particular puzzle being solved (your genome) is not quite the same as the reference. If the difference between a puzzle piece and the corresponding reference location is small enough, the piece can still be placed. But if the piece represents a large or complex variation that resembles nothing in the reference, then the piece cannot be placed, and the alignment method fails to identify that variation. Assembly-based variant calling does not have this fundamental limitation, and can identify this variation, because it does not depend at all on the reference.

RM: How can HPC more generally help genomics?

JC: Challenging computational problems exist all the way up and down the bioinformatics spectrum. At the basic measurement level, problems similar to that solved by CompStor Novos are everywhere: in transcriptomics, with RNA analysis; in proteomics, or the analysis of proteins; and other areas. In each of these, the complex computation is required to tease out, often from very indirect and inexpensive measurements, what exactly is going on. Summarizing what information from these basic measurements is relevant is difficult even on one subject. Imagine combining such information with thousands, then millions, then billions of subjects to search for patterns. HPC will be not just a help to this area of inquiry, it will be a fundamental requirement – and as a result industry analysts are predicting many more HPC biology solutions to appear on the marketplace in 2019.

RM: When can we expect to hear about your publication intended for Nature Methods?

JC: We anticipate publication in spring 2019. A preprint of our joint paper with researchers at Mayo Clinic’s Center for Individualized Medicine is available now at www.biorxiv.organd at www.omnitier.com.