SC10 Nuggets

The annual supercomputing show (SC10) is always full of “Top-something” news, eye-catching displays, and the latest and greatest in HPC hardware and software. While I like all the big news, I also find that there are other “quieter stories” that didn’t make the mainstream. By quiet, I mean they often do not have a corporate PR firm making sure their message gets out. I like to dig through some of these stories because I think they are interesting and may someday be big news.

The first story is one of my favorite academic booths. The Aggregate.org (University of Kentucky) is always on my list. Back in 2008, Hank Dietz and his Aggregate.org crew started talking about their MIMD On GPU or MOG project. (Note: MIMD stands for Multiple Instruction Multiple Data computing which is similar to a cluster or multi-core CPU. A GPU is Single Instruction Multiple Data (SIMD) architecture.) This year they announced that a very alpha version of the MOG code was available. For those who wonder how MIMD code can execute on a SIMD architecture, the Aggregate.org had their trusty wooden maze with four balls in it. Each of the colored balls has a different path to get to the end of the maze (MIMD code), yet it is perfectly feasible to get all the balls to the finish by a series of tilts of the table (SIMD execution). With MOG you can run standard codes (C, Fortran, and even MPI and OpenMP) on GPUs. You can find more information at SC10 Aggregate.org page.

Each year at SC, there is a Student Cluster Competition (SCC). This is the ultimate geek Olympic event and how I wish I could be on one of those teams. It seems like a fun and interesting way to spend your day (and night) while at SC. This year teams worked with their adviser and vendor partners to design and build a cutting-edge commercially available small cluster that was constrained by 26 amps of current.

Teams were given the task of running real HPC applications with the same competition data sets. Teams worked on the applications during the show. The winning team was from National Tsing Hua University (Hsinchu, Taiwan) which partnered with Acer Incorporated, Tatung Company, and NCHC. The were judged on the highest aggregate score in the HPCC benchmark, throughput and correctness of the 4 real-world applications, and interviews — all using 26 amps!

The University of Texas at Austin partnered with Dell and TACC for the highest HPL value (1.07 TFLOPS) in an HPCC run while staying within the 26 Amp current limit. This is the first year for any SCC team to break a teraflop! Actually, there were three teams that joined the TFLOPS Club (Others were Louisiana State University and National Tsing Hua University). Also worth mentioning is the University of Colorado at Boulder who partnered with Dell, AMD, Mellanox, and FusionIO through the HPC Advisory Council. They received the most votes from SC10 attendees for their all around coolness. The winning teams are pictured at the SC10SCC site. Go Geeks!

My final, nugget is an SC10 poster that I happened to catch. It was called, “An Atomic Tesla: Avoiding the Power Wall.” Essentially, a team of researches from Harvard University, asked a very simple question. “Can GP-GPUs be driven by simple low power processors?” A GP-GPU is typically placed in a power hungry server. The research tested the performance of an Nvidia Tesla S1070 in three systems including an Intel Atom 330 CPU, a conventional Xeon L5410 (Harpertown), and an Intel Xeon E5540 (Gainestown). The E5540 used a QuadroPlex S4s card. Their conclusion was that after accounting for the differences in PCIe slots the Atom “held its own against the server chips.” The Atom had a PCIe 1.0 x16 slot while the Harpertown system had a PCIe 2.0 x16 slot and the Gainestown system a PCIe 2.0 x8 slot. Power draw for the entire Atom system (including the GeForce 9400M chipset and hard drive) was less than 50 Watts under load.

Now there are a few things to consider. First, this type of arrangement is best suited for problems that have dominant parallel portions (code that can be executed on the GP-GPU). If there are heavy sequential portions to the code, then the Atom may loose to the faster Xeon processors. However, if the code is GPU dominant, then using an Atom to “service the GPU” may be a very good low power solution. In particular, Atom processors are available on small form factor motherboards (e.g. mini ITX) which are both low power and low cost. Combined with a GPU this could be a high performance, low cost, low power computing node. Of course, AMD has taken this one step further and has crammed it all on a single processor. I think 2011 is going to be a fun year for low cost HPC.