]]>It seems like every time a chip vendor talks about its latest netbook processor, there are a flurry of articles about how such chips could be worked into a server. The impetus for this line of thinking is the power crisis in the datacenter. Processors targeted for netbooks and handheld consumer devices are ultra low-power and usually have a better performance per watt metric than your traditional server chippery. Not only that, these power-saving CPUs sell for just a fraction of the price of a typical server processor.

The most recent example of this line of thinking was precipitated by AMD’s unveiling of its upcoming Bobcat microarchitecture last month at Hot Chips. Bobcat is the company’s future core design destined for the netbook and notebook market. It wasn’t long before articles like this from HotHardware.com showed up, suggesting that the new core design might be a great fit for ultra low-power servers and microblades.

The idea is that these power-sipping CPUs are especially efficient at scaled-out computing, where individual core performance is less important than the aggregate performance of the entire system. The idea, of course, is to offer the equivalent computational throughput for much less power than a conventional Opteron- or Xeon-based server. On paper that’s true. Bobcat, for example, is advertised to offer a sub one-watt core with about 90 percent of the performance of a mainstream notebook chip. Certainly one would expect Bobcat-based CPUs to offer much better performance-per-watt numbers than their larger Opteron brethren.

In some cases, this creative thinking has gone somewhat further, that is, into actual product roadmaps. Earlier this summer, startup SeaMicro announced it was going to use an Intel 1.6 GHz Atom processor to power a new breed of low-power server. The SM10000 stuffs 512 Atom processors into a single box, while being able to run off-the-shelf applications and operating systems. SeaMicro’s claim is that it can deliver comparable performance to a conventional x86 server, but use just a fourth of the power and space.

Prior to the SeaMicro revelation, Dell announced an “ultra-light” server powered by the VIA Nano processor. The new XS11-VX8 offering will pack 12 servers into a 2U box, and will cost less than $400.

Given that there has been little experience with this type of computing, the application set for these ultra low-power servers is still a bit fuzzy. It appears that Dell and SeaMicro are aiming their offerings at cloud hosting, Web farms and other light-load applications. The practical consideration here is that single thread performance is not all that good for these under-powered chips, especially compared to a Xeon or Opteron processor. But applications that can be divvied up efficiently across many processors into independent lightweight tasks are perfect for this kind of computing.

On the other hand, where single-thread application performance is the bottleneck, execution times will suffer. Sure, power is expensive, but time is even more so. That makes most compute-bound workloads, including the vast majority of HPC apps, unsuitable for these lowly chips, with the possible exception of embarrassingly parallel codes.

That might be the end of it if it weren’t for this GPGPU phenomenon. In this case, the CPU is used to drive the GPU, where the most compute-intensive piece of the application is executed. If enough of the app can be offloaded to the graphics accelerator, the CPU need not be all that muscular. Thus a power-sipping CPU might be the perfect companion to the power-hogging GPU.

In practice, though, I don’t think we’re quite there yet. From what I’ve gathered, the profile of many GPU-ported codes is such that they still rely on speedy CPUs for at least a portion of the application. It would be interesting for GPGPU developers to track execution cycles on the two processors, and determine how big a CPU is really required for a given code. It might even give some enterprising vendor an idea about how to build a better balanced GPGPU server.

]]>Alongside multicore CPUs and cloud computing, GPGPU is a technology that will continue to shake up the way computing is done for years to come. To the general public, the GPGPU phenomenon is probably the least visible of the three I mentioned, but it may end up having just as much of an impact.

The most noticeable effect from GPU computing will be the way it redefines what we think of as a general-purpose processor. Historically, specialized processors get swallowed by the CPU when their functions are no longer thought of as specialized. We saw this with floating point units and, more recently, with memory controllers. (Although some flexibility gets lost, the integrated model is much more economical in terms of power usage and space.) We’re seeing this same general-purpose capability coming to fruition in GPUs. Today these devices can be used for traditional graphics, advanced visualization, and floating point/vector processing. The rise of general-purpose GPU computing will inexorably push graphics-flavored logic onto the CPU die.

We’re already seeing CPU-GPU designs coming from the two big x86 chip vendors. AMD is blazing the trail with their Fusion APU (Accelerated Processing Unit) processors, the first of which are slated to show up in early 2011. These initial designs will be targeted to the consumer market — desktops and notebooks — where video processing and CPU-centric applications are already well integrated. Intel’s upcoming Sandy Bridge processors that are aimed at the consumer space will also incorporate a GPU on the same chip. Like AMD, these processors will available in early 2011. For both chipmakers, this represents the first time CPUs and GPUs will share the same silicon real estate.

For GPGPU enthusiasts though, these early heterogeneous designs really represent transition technologies. In most cases, the integrated GPU will be used for traditional graphics and visualization, with the CPU still handling most of the floating point and vector math. In that sense, there will be some redundant functionality on these early chips. The larger payoff will occur when the CPU’s floating point and SIMD logic is merged with the GPU. It’s probably wrong to think of that as an endpoint, since it’s more likely to play out as gradual evolution over multiple generations of processor architectures.

Before then we should see CPU-GPU designs for server chips. AMD has hinted about such platforms, but hasn’t committed to any specific products or roadmap. For this to make sense economically, the semiconductor process will have to be small enough to get a high-end CPU and GPU on the same die. That probably won’t be practical until chips can be manufactured below the 32nm node. Also, software that can take advantage of heterogeneous designs will have to be in place to support a broad market for these chips in the enterprise — i.e., not just for high performance computing. Because of these constraints, I think the earliest we’ll see CPU-GPU server chips will be 2012,and more likely 2013.

So where does this leave CPU-less NVIDIA? Right now, the company sits atop the GPGPU computing market, but has no public plans to integrate its high-end GPUs with a CPU. For the time being, at least, NVIDIA seems content to pursue the GPU computing market with discrete devices, like its Tesla products, connected remotely to x86 processors.

Ironically, though, the greater success NVIDIA has in building a GPGPU business and bringing more applications into the fold, the greater the demand will be for CPU integration. And if both AMD and Intel start offering high-end CPU-GPU products, NVIDIA’s discrete GPU business will suffer.

It’s worthwhile noting that NVIDIA actually does have a CPU-GPU platform in its current Tegra line of processors for mobile devices. The CPU in this case is the ARM processor, a compact little chip that is quite popular for low-power platforms like cell phones. It’s not too far a stretch to think NVIDIA may be designing a chip that marries its CUDA-class GPUs with ARM CPUs. This week, startup Smooth-Stone revealed it will build servers based on ARM processors. If these are able to gain a foothold in datacenters, an NVIDIA Tesla-ARM server chip would look very interesting indeed.