Thus it's not surprising that The Barcelona Supercomputer Center (BSC), an internationally renowned supercomputing center, turned to the Tegra 3 when they were looking for a lightweight processor to serve the role of communications mediator in a new energy-efficient GPU computing design.

The new Spanish supercomputer will be dubbed "Mont-Blanc".

In the new design, the bulk of computation will be carried out using either OpenGL or NVIDIA's proprietary CUDA API, with kernels (pieces of parallel work) running on unannounced NVIDIA low-power GPUs (the center says are similar to the GeForce GT 520MX). The GPUs will likely be based on the new Kepler architecture (GeForce 600 series), which proved extremely energy-efficient in AnandTech's benchmarking.

But something will need to ferry data to the GPU, while performing I/O between GPU-enabled computing units and offloading results to storage devices. That something will be the Tegra 3.

Tegra 3 chip package and die image [Image Source: AnandTech]

While the typical Xeons gobbles 50 to 100 watts, the Tegra 3s in Mont-Blanc are expected consume around 4 watts when loaded. The finished design is expected to pack anywhere from 2,000 to 4,000 processors inside, along with corresponding GPUs.

The BSC project aims to become the first ARM-powered supercomputer to make the Top500 list of supercomputers, which is currently topped by Japan's 10.5 petaflop "K computer." However, the BSC aims to be at the very top of another newer list -- the Green 500, a list measuring the world's most power efficient supercomputers.

That list is currently topped by International Business Machines, Inc.'s (IBM) Blue Gene computer at the Thomas J. Watson Research Center, which can do 2 quadrillion calculations per second (2 gigaflops) per watt. Mont-Blanc aims to earn 7 gigaflops per watt, making it more than three times as energy-efficient as any design in the world today.

Alex Ramirez, product manager at BSC who's working on the project, in an interview with Wiredstates, "Instead of using very few — but very big performance — processors… we’re going to be using a lot of very low-power — but middle performance — processors."

II. Tegra ARMv8-powered Successor Design Aims to Top Top500

But Mr. Ramirez's team is far from finished when they complete the lightweight design sometime in the next couple years. They're already planning its successor, a next generation design, which will be build on NVIDIA's 64-bit ARMv8 instruction set Stark series (or possibly a later variant).

That 64-bit design is aimed at becoming the world's most powerful supercomputer, with a planned processing power of 200 petaflops (twenty times the K Computer's capacity). That design will be completed in 2017 (approximately), according to plans shared with Wired.

A new 64-bit design to launch in 2017 will be even beastlier than Mont-Blanc.
[Image Source: Flickr/pirinenc]

ARM Holdings itself has invested casually in pushing out power-efficient ARM servers to IT users. However, its focus remains on its core markets -- smartphones, tablets, and embedded devices -- as well as its push into the personal computer space. Stated new ARM president Simon Segars last month, "Supercomputers, for ARM, is not a high volume market,” he said. “It’s not something we spend a lot of time talking about. Ours is a business that is royalty and unit driven, so we’re interested in high-volume markers."

For now, he says, ARM supercomputers are an "interesting" diversion, but not a business focus.

The GT640M LE, GT640M, and GT650M are technically considered discrete GPUs.

This supercomputer is based on 'project denver', which will be a specially designed Tegra 3 made to handle I/O and feed 'discrete GPU's', totally eliminating a need for any Intel part. Over 95% of the actual compute work is going to be done on the GPUs, the CPU is just there to feed them.

New silicon isn't cheap to design and produce in small batches compared to off the shelf cell phone chips and how much less power do you think can be saved on the CPU side if the CPUs are each consuming 3watts and the GPUs are gobbling up 200+watts each while doing the real work?

The use of low power GPU's is interesting. It makes me think they may be putting a high priority on available GPU memory (i.e. high ratio of memory to compute resources). High end GPU's have orders of magnitude more compute power, but only 2 or 3 times as much memory. Interesting stuff...

quote: The use of low power GPU's is interesting. It makes me think they may be putting a high priority on available GPU memory (i.e. high ratio of memory to compute resources). High end GPU's have orders of magnitude more compute power, but only 2 or 3 times as much memory. Interesting stuff...

Having worked with CUDA code myself, I would guess they're going to go for some sort of custom Kepler GPU solution, with a lower clock, but similar cache, but potentially shared global memory.

One of the biggest problems with CUDA speed-wise is that it overwhelms on the computational front getting work done very fast, for parallelizable problems (so clock speed could be lowered) and has typically more global memory than you need. The limitation is in SMP count (which constrains how much work you can do @ once) and the speed cost of ferrying results off the GPU via the PCI bus.

If Mont-Blanc or its successor use a unified global memory with the CPU, latency might increase slightly on the GPU side (as discrete card GDDR5 is relatively low-latency), HOWEVER, you would eliminate a costly transfer step.