Ready for the GPGPU Revolution?

By John Russell

November 16, 2010 | Strolling through the poster section at September’s GPU Technology Conference*, one was struck by two things—the sheer diversity of applications being sped up by graphics processors (GPU) and the large number of life sciences applications represented. A group from UC Davis/U Washington, for example, reported up to 10x improvements running the Rosetta protein-folding algorithm.

Graphics processors used to be just that, specialized devices intended to speed and enhance computer graphics. The first GPUs were hard to program and mostly used to accelerate VGA (eons ago, I know) computer displays. A mess of companies (chip and board makers) jostled for sway in the early graphics acceleration market. Nvidia and ATI emerged as the big guns, and ATI was acquired by chip-maker AMD in 2006. Just this summer AMD reported it would retire the ATI brand.

Nvidia is a juggernaut in the space. Founded in 1995 and based in Santa Clara, Calif., Nvidia is led by Jen-Hsun Huang, a co-founder and president and CEO. Last year’s sales were about $3.3B. Staff size is around 5000. This year’s GPU Technology conference drew roughly 2000 attendees from 50 countries and was distinctly dominated by programmers and scientists with relatively few marketers.

Several things have happened to propel GPUs forward. The devices themselves, particularly their programmability (e.g. Nvidia’s CUDA architecture), have advanced dramatically. Traditionally, GPUs were designed to do fewer things, but to do them very fast, and to do them in parallel on very many cores. Conventional CPUs—Intel’s x86 architecture dominates—do many things very well, but do them one at a time, or in the case of multi-core CPUs do only a few at a time. It turns out GPU architecture is ideal for many scientific calculations particularly those speeded by parallelizing execution.

The other game-changer is the gush of data from modern instruments; it has overwhelmed traditional CPU capacity and forced researchers to attack data-intensive computation with massive clusters. This has two key drawbacks. Performance, though improved, doesn’t scale especially well and certainly not commensurate with the data growth. Second and perhaps most importantly, big clusters greedily consume power, cooling and space which sends datacenter costs skyrocketing. The cost of the servers is almost inconsequential by comparison.

Enter GPGPU (not that we need another acronym) which stands for General-Purpose computation on Graphics Processing Units. Once specially designed for computer graphics and difficult to program, today’s GPUs are general-purpose parallel processors with support for accessible programming interfaces and industry-standard languages such as C. Developers who port their applications to GPUs often achieve speedups of orders of magnitude,” according to the GPGPU.org website.

Actually, the broad notion of heterogeneous processing environments seems to be gaining traction: “It is my opinion that future bioinformatics centers, especially large computing centers, will have a mix of technologies in hardware that include standard processors, FPGAs, and GPGPUs, and new jobs will be designed for, implemented on and steered to the appropriate processing environments,” says Harold “Skip” Garner, executive director of the Virginia Bioinformatics Institute at Virginia Tech.

You may recall that FPGAs (field programmable gate arrays) have a somewhat checkered history in life science computation in large measure because like GPUs they too were difficult to program. But at least one young company, Convey Computeris trying to change that with an innovative architecture pairs FPGAs as coprocessors with x86 cpus (see Convey's hybrid-core computing whitepaper).

A major Nvidia strength is its dominant position in the massive consumer market.

“If you look at the HPC market it is full of companies that failed because they were doing a product only for HPC,” says Sumit Gupta, senior product team manager, Tesla GPU computing. “We took a consumer product and redesigned the HPC value. We put in special hardware and had an entire software team build a software ecosystem. Being on the consumer train gives us the economies of scale.”

The company has four main product lines: GeForce aimed at the PC and gamer world; Quadro for the professional workstation and enterprise market; Tesla targeting high performance computing (HPC); and Tegra for mobile devices. In his opening keynote, CEO Huang outlined Nvidia’s product roadmap through 2013 showing chips Tesla (’07), Fermi (’09), Kepler (’11) and Maxwell (’13) with the last surpassing 16Gflops per watt.

Nvidia has been aggressively courting the scientific software community and reported important wins at the conference, not least were support from MatLab for CUDA in MatLab’s Parallel Computing Toolbox; a multi-GPU enabled Amber 11 (#1 molecular dynamics package); and ANSYS (#1 engineering simulation software). Huang also announced HPC hardware commitments from IBM (Bladecenter), Russian computer maker T-Platforms, and Cray (XE6).

There is clearly at least a modest revolution afoot in HPC GPU computing. True parallel processing offers huge performance gains for many life science tasks where data types are few. Gupta says many in the next-gen sequencer community will use GPUs in their machines. He also argues the many bioinformatics applications skipped porting to multi-core traditional CPUs because the performance gains didn’t justify the investment but that many are now evaluating GPU support.

As if to underscore GPGPU potential in life science, the second keynote was delivered by Klaus Schulten, from the department of physics and theoretical and computational biophysics group, University of Illinois, Urbana-Champaign.

“When we realized the potential of GPUs we decided early we wanted to make three uses of potential,” Schulten told the gathering. “The first was we wanted to increase the accuracy of our simulation. The second was we wanted speed up simulation to make calculating biological systems a more convenient which was possible because of the speed gains…[L]ast we wanted to open doors to new fields to tackle problems that were not possible before because computing took too long.”

This was in 2007, when CUDA was introduced. Schulten then walked through several illustrative case histories including one to identify Swine flu resistance to the Tamiflu vaccine using simulation.

“At the high point of the epidemic it was realized that the virus had become resistant,” said Schultem. Tamiflu had been designed to plug a key hole in the viral needed to do a critical chemical step (cleave bond) to produce itself. The simulation revealed “Tamiflu is not binding in one step but in two steps and that the additional step is actually the one where the virus realized it can fend off the drug. Knowing that, pharmacologist can now design drugs for that step.”

Computation enabled by GPUs, constitute “a new computational microscope to view small systems in living cells. We need to see those because those are scales that cannot be seen with other technologies and that are relevant for pharmacological intervention,” he said.

He cited a polio virus surface receptor simulation conducted on the National Center for Supercomputing Applications GPU cluster which experienced a 25.5X speed-up and 10X power efficiency gain over conventional computing approaches, and a molecular dynamic simulation of RNA/ribosome translation conducted on NCSA Lincoln Cluster which was reduced from two months to two weeks of compute time.

All in all the conference was a powerful reminder of GPU progress, packed with stunning visualization demonstrations and talks. It is interesting to consider the surreal images made possible by GPU computing may actually produce insight not readily available using our ‘normal’ perspectives. Consider how Riemannian geometry helped Einstein develop his ideas of curved space. It may be that the strange computer-generated visualizations, in some cases, mirror reality better than our human senses.