Supercomputing and Big Data

The emergence of whole genome sequencing has certainly had an impact on both scientific research and clinical medicine, but there are still a few remaining hurdles preventing widespread use of the technology. While the ability to interpret the raw data provided by genetic testing relies on the computing power of laboratory instruments, few laboratories have the analytical equipment capable of that kind of speed. A recent news briefing from Dark Daily noted on recent study in which researchers at the University of Chicago, using software available to the public, successfully incorporated the “Beagle” Cray XE6 supercomputer in their genetic testing.

“Whole genome analysis requires the alignment and comparison of raw sequence data,” said Elizabeth McNally, MD, PHD, professor of medicine and human genetics and director of the cardiovascular genetics clinic at the University of Chicago’s School of Medicine, in the Dark Daily briefing. “[This] results in a computational bottleneck because of the limited ability to analyze multiple genomes simultaneously.”

According to the release, the research team was able to analyze 240 whole genomes simultaneously. Using the “Beagle,” for their study, titled “Supercomputing for the parallelization of whole genome analysis,” they looked at sequencing information from 61 patients -- which took less than 50 hours and only 25 percent of its total capacity. To put this into perspective, the Dark Daily briefing pointed out that, for a 2.1 GHz CPU, it would take “roughly 47 years to analyze the same data.” Because of this, most current laboratory practices simply analyze the human exome, which is comprised of only about 2 percent of the human genome responsible for protein coding. Although the briefing noted that this is considered to be the basis of 85 percent of “disease-causing mutations,” the ideal method for research would include the whole genome -- an approach that isn’t currently feasible for consistent use.

“By paying close attention to family members with genes that place then[m] at increased risk, but who do not yet show signs of disease, we can investigate early phases of the disorder,” continued McNally. “In this setting, each patient is a big-data problem.”

New screening technologies are continuing to improve the accuracy of testing technologies, but the laboratory’s ability to breakdown and investigate the results of those tests quickly and accurately is every bit as important. By successfully utilizing the “Beagle” supercomputer in the analysis of raw genetic data, the researchers at the University of Chicago have opened the door to improved options for understanding sequenced information. The potential of quicker interpretation capabilities in whole genome sequencing could lead to more standard genetic tests with faster turnaround times and, subsequently, a more cost-effective laboratory model.