The Challenges of High Performance Computing

Research laboratories are expected to deliver high performance computing (HPC) systematically and reliably to keep pace with the unprecedented levels of computation, storage arrays, and networking switches researchers require to gather, evaluate, and move the voluminous data they have to grapple with. The unrelenting increase in the volume of data generated in modern laboratories poses tremendous challenges for managers and directors tasked with facilitating optimal performance while simultaneously minimizing power usage by computing systems, maximizing the efficiency of their cooling processes, and maintaining energy expenditures at the lowest levels practicable—at a time when prices are at historic highs.

Although such challenges appear daunting enough, they also have to be addressed in an environment overlaid with a solid interest in reducing carbon footprints and where there is much greater sensitivity about the potential impact of energy consumption on climate change. Furthermore, with some HPC systems consuming as much electricity as all the residential and commercial users in a typical small city, there is the real possibility that local utilities may be unable or unwilling to supply the power needs of HPC data operations. Recent reports suggest that planners of new or expanding HPC sites, some at multi-petascale and exascale levels, may have to consider myriad power options—even the deployment of small-scale nuclear power reactors are reportedly on the table—to address their energy needs.

Just a decade ago, such power issues would not have made it on a list of lab managers’ top key concerns. Now, however, they are constantly listed among the top HPC challenges. Dell’s HPC Global Director Tim Carroll notes that HPC has acquired heightened prominence with a corresponding increase in demand within the last decade, and acknowledges that his company, known mostly as a leading manufacturer of personal computers, is now heavily involved in HPC systems.

HPC has long been the purview of the national labs, which needed and were focused on large and powerful systems. This meant that as this sector evolved the highest end of the HPC spectrum received the most attention. Carroll notes that this did not necessarily translate into delivering more computational cycles into the hands of a greater number of researchers.

The current status in the HPC sector reflects both good news and bad news scenarios for lab managers. The good news is that because of the efforts of Intel Corporation and other processor manufacturers, the vast majority of the research community can fairly easily get enough computational power to solve their problems. “This is true for about 80 percent of the cases,” says Carroll.

He points out, however, that because the computational problem has not necessarily been fully solved—it is just under control—now all the research clusters are generating data “at a torrential pace, and there is a storage explosion because the researchers don’t want to get rid of any of the data.”

Carroll says that some of the most compelling evidence of this issue could be seen among Dell’s customers who are engaged in genomics, where a single researcher can generate one to two terabytes of data per day. Genomics researchers do not typically have a data generation problem; they have data storage and data movement problems—they have to manage immense data repositories that must be moved from one researcher to another and from one lab to another.

In a recent case, the 350-employee-strong Virginia Bioinformatics Institute at Virginia Tech addressed some of its I/O intensive and large genomic data challenges as well as its unique software and hardware requirements by deploying Dell’s Power Edge C6100 servers with Intel’s Xeon processors along with Intel’s QuickPath Interconnect to enhance its system performance. While accelerating its research processes, the use of clusters from Dell and Intel enabled easy expansion, boosted the energy efficiency of the servers, and minimized the costs associated with power and cooling.

Similarly, The Translational Genomics Research Institute (TGen) opted to build a new HPC cluster to deal with the massive amounts of data generated by its clinical trials for neuroblastoma (one of the most common pediatric cancers). By deploying a cluster of Dell’s PowerEdge blade servers and Intel’s Xeon processors, TGen was able to achieve a twelvefold gain in its processing power for patient data, along with thrice the number of cores in the same floor space. The new scalable cluster initiative was capable of supporting 100 percent growth in data volume year after year and created the basis for 800 server cores to be managed by one IT administrator.

HPC is an important enabling technology for genomics/ bioinformatics. In 2003, the aggregated cost to map the human genome was estimated at $3 billion, whereas today the bill for a similar undertaking could be as small as $1,000. As costs get lower, however, there is an exponential increase in the amount of data generated and in the demand for HPC capacity—data storage, immense memory capacity, and huge bandwidth connecting data repositories and computer clusters.

“Researchers engaged in weather modeling or related computer-intensive problems with lots of complex networking operations may face completely different sets of challenges,” says Carroll. He adds that these are the elements that make the HPC field challenging, because there are a number of different infrastructure challenges under the umbrella of research HPC. “The bigger the laboratory and the greater the number of researchers to be supported, the greater will be the need for different types of solutions. The challenge for labs is to build the type of infrastructure that will meet the different needs of a number of researchers and in the face of tight budgets ensure that this is done in a way that focuses on supporting the researchers and not necessarily on solving infrastructure problems alone.”

“One of our key capabilities is to quickly get together with lab management teams to assess the budget of the system—this is not just financial; it also includes power, space, cooling, and staffing,” he says. “Once we understand those parameters it becomes easier to design and shape the right system, with the key requirement being whether it is usable for the researcher in addressing the problems at hand.”

He says that once the equipment is agreed to and ordered, one key question is whether it is in place and running science in six weeks or six months. “And once it is there, does it just run so people don’t have to think about it because it is stable and reliable and integrates well with the other aspects of the lab, or is it is like a finely tuned race car that is down half the time because of inappropriate architecture for the situation in which it must function?”

Carroll says that one important piece of advice he would give lab managers who are not experienced in procuring and managing these systems is never to tell researchers to request what they need and then take on the task of sourcing the systems without the involvement of the researchers who will be using them. “It will be much better and smoother if the people supporting these systems are a part of the process from the beginning and throughout.”

Reflecting on how the sector has evolved, Carroll says there was a time when there were both HPC systems and commercial data centers. “Because the HPC needs for the supercomputers were so specialized and the power and cooling requirements were so different from traditional needs, they had to be separated.

“What has happened over time and has certainly accelerated over the last two or three years—with the growth of HPC and the shrinking of budgets to support these initiatives—is that traditional systems and HPC are moving to the same data centers. That is step one,” says Carroll.

Eventually, he says, whether it is the next five or ten years, it will be possible to move workflows such that it can be decided at any given time what portion of the system is configured as HPC, what is serving up web capabilities, and which part is performing key functions. “This is the only way to get greater efficiency from the data centers, and it is not about money; it has to do with the fact that a lot of them are out of data storage space,” says Carroll.

So the challenge is to be more efficient with the available space, according to Carroll. Current dynamics in this area are forcing an examination of how the same infrastructure could be used for multiple purposes and become more efficient. He says federal labs do not have budgets for additional buildings, which could take as long as ten years to construct and make operational anyway, while cashstrapped state labs do not seem to have much hope of accessing funding for new construction projects.

Any growth seems to be on the commercial side, especially in biology, genomics, and proteomics. “In fact, what we are seeing there is that rather than building conventional data centers there is a rapid adoption of containers or pods, which are totally self-contained from an environmental perspective, with complete capabilities installed on a concrete pad rather than in a building. It produces unbelievable efficiency from a power consumption perspective, using a far smaller footprint than would an entire building, without losing any functionality on the computer side.”

Interestingly, according to Carroll, the sixty-second-largest supercomputer in the world was deployed by Dell last year at the University of Colorado. “It was in a container, not a data center; [there are] two metal containers sitting at the side of a building in Colorado.”

“These pods will play an important role in raising efficiency. Most data centers are inefficient, with 15 percent to 20 percent of the power going into the building but not making it to the computers. A lot can be done to reduce carbon footprints just by making data centers more efficient.

“Our vision of HPC is that it should be analogous to GPS, a highly advanced technology that is being put to a number of everyday uses. Five years from now we should be able to make it easier for lab managers to deploy these systems for their researchers—we are not there yet, but the goal should always be to make it easier for more people to access this technology,” says Carroll.