Intel fashions supercomputing phoenix from ashes of Larrabee

Out of the ashes of Intel's Larrabee project rises a supercomputing chip that …

In an announcement at the International Supercomputing Conference, Intel provided further details on the many-core chip that it hinted at earlier in the month. The first product based on the new design will be codenamed Knight's Corner, and will debut at 22nm with around 50 x86 cores on the same die. Developer kits, which include a prototype of the design called Knight's Ferry, have been shipping to select partners for a while now, and will ship more broadly in the second half of the year. When Intel moves to 22nm in 2011, Knight's Corner will make its official debut.

Knight's Corner will be the first product to implement Intel's Many Integrated Core (MIC) architecture, a many-core x86 architecture that draws on ideas from Larrabee, the Terascale program, and the Single-Chip Cloud Computer (SCCC) project. Though there aren't any details of the microarchitecture of the cores in the product, it seems fairly clear that these are basically the same in-order, Pentium-class, vector-heavy cores that Larrabee used. Anand says he has heard that the vector hardware was left in the cores, which makes sense if they're aiming it at high-performance computing (HPC). Then there's this otherwise very confused PC World article, where the writer obviously heard something about a 512-bit vector unit. It makes sense that Intel would leave the vector hardware in the cores, since this product is aimed at HPC.

Because the MIC family is basically a spinoff of Intel's failed discrete GPU project, most of the commentary on Knight's Corner focuses on it as a threat to NVIDIA's Tesla product. But if I ask myself to think of a large, complex, niche, monster of a floating-point beast that MIC could possibly be a threat to, the answer that pops into my head isn't Tesla, but Itanium.

You'll recall that Tesla is also a kind of "many-core," vector-heavy, GPU-derived coprocessor aimed at HPC workloads, so in this sense there's considerable overlap with MIC. And the standard thinking, which Intel is happy to promote, goes that MIC is better than Tesla because it's x86 and Tesla isn't, which means that it will be easier to port code to the new processor. So if both Tesla and Knight's Corner are GPU-derived, many-core, floating-point-centric processors with support for plenty of thread- and data-level parallelism, why am I suggesting that the MIC architecture in general is probably a greater danger to Itanium?

The first part of the answer lies in defining what you mean by "easy to port."

The hard part about porting from a multicore or single-core architecture to a many-core architecture is not the ISA transition, it's the fact that you have to redesign most apps from the ground up. Porting to many-core, whether it's Tesla or MIC, requires you to start over from scratch in the vast majority of cases. The end result is that going from x86 to MIC is, for many applications, about the same level of challenge as going from x86 to Tesla, because you have to start over from the application and algorithm design phase.

Note that none of this is to say that Intel MIC and Tesla are the same—they're different in some very fundamental respects, not the least of which are the facts that the MIC cores are better for general-purpose computing, and that MIC has a real virtual memory implementation. My only point is that regular x86, MIC, and Tesla represent three different architectures, and to go from x86 to either MIC or Tesla means that you have to start over.

Some may object to the claim that you absolutely must start over if you go from x86 to MIC, because MIC is a collection of x86 cores, which would seem to imply that you could just run some vanilla x86 code on a MIC machine. This is true, of course, but why would you ever want to do that? Why would you shell out for a giant, 50-core MIC chip to run some minimally parallel workload on three or four in-order cores, when Intel will sell you a pair of dual-core Atoms for next to nothing? You either rearchitect your application to use a very large number of cores, or you stick with the much cheaper multicore x86 options.

In the end, MIC's attractiveness vs. Tesla will have little to do with its being x86, and more to do with its relative performance per watt on the kinds of workloads that HPC customers care about.

The final part of the "why Itanium, and not Tesla" answer is rooted in the fact that Itanium's HPC fortunes are already on the wane. Itanium's showing in the Top 500 supercomputer list peaked in November 2004 at a high of 84 systems. As of June 2010, that number has dropped to a mere five systems. Meanwhile, 64-bit x86 has taken over the list, and three of the top 10 systems feature data-parallel coprocessors derived from commodity gaming platforms (Tesla, Radeon, and Cell). This processor/coprocessor combination of commodity general-purpose and data-parallel hardware looks set to continue to increase in popularity. It seems likely that the commodity combo of 64-bit x86 plus GPU-derived hardware will continue to displace Itanium in HPC. And if MIC can compete with Tesla for the coprocessor spot, then it will be part of that displacement.