We cannot hide that we were quite shocked with Geforce GTX Titan Z announcement and the price of that card. The $2999 Geforce for science community and some super rich gamers sounded quite expensive to us. Then we saw that Intel has launched its new Xeon Phi Co-processor that is a compute card that sits in PCIe servers. The new addition to Xeon Phi Co-processor 57, 60 and 61 is called 7120A and is based on 22nm Knights Corner architecture.

For those who are with us for a few years, Knights Corner is a successor to failed Larrabee graphics card / server compute card. The latest addition to Knights Corner, the Xeon Phi 7120A has 61 cores, supports 244 Threads and comes with massive 16GB of memory with maximum bandwidth of 352 GB/s. It is a 300W card and A stands for Active cooling. There is a 7120p version based on passive cooler, 7120D based on dense form factor and the 7120X doesn’t come with a cooler. The peak double precision performance for 7120 cards is 1208 Gigaflops (1.208 Teraflops) compared to Tesla K10's 1.43 Teraflops.

All Xeon Pri 7120 variations clocked at 1.238 GHz with all 61 cores and the price varies between $4,129and $4,235 which is much more than the Geforce GTX Titan Z. The 7120 series even supports turbo that can get the cards to 1.33GHz, a slight speed increase for 61 cores with impressive 30.5 MB shared cache (512 KB per core).

Nvidia's Tesla K40 professional compute solution with 12 GB memory is listed for $4760 and we can imagine that possible dual GK110 based Tesla will cost even more but offer significant performance boost.

This is not that much of money for GPGPU compute market, where you need thousands of dollars to set up a decent compute server.

Although Intel's Larrabee project was scraped, or shelved to be precise, it appears that Intel has at least used some of the experience gained in Larabee development. The compny finally announced Knights Corner, a Many Integrated Core (MIC) multi-core computer architecture co-processor, that will be a part of Intel new Xeon Phi line.

According to first details released by Intel, the general specifications of the Knights Corner are a single PCI-Express card that will feature over 50 x86 cores made with 3D Tri-gate 22nm manufacturing process and at least 8GB of on-board GDDR5 memory. Knights Corner will provide 1TFLOP of double precision performance. Unfortunately, those are the only details that Intel is ready to share for now, but we guess that we will surely hear more about Xeon Phi.

Although it borrows some desing details from the Larrabee project, Knights Corner is solely focusing on High Performance Computing rather than graphical performance. The Knights Corner has a tough battle ahead of it considering Nvidia's recently announced Tesla K20 graphics card capable of providing up to 2TFLOPS of computing double-precision performance, but Intel's x86 architecture, independent Linux operating system that manages each of those cores, provide much atractive platform for develepers than Nvidia CUDA.

Actually, Cray has already announced its Cascade supercomputer that, although currently runs on Xeon E5, will get updated with Xeon Phi as soon as possible.

Although it has revealed some details regarding the MIC architecture and Knights Corner co-processor, Intel actually just launched the Xeon Phi brand, while Knights Corner should be ready by the end of the year.

Earlier this week, Intel VP Kirk Skaugen released a PowerPoint slide detailing the rich history of Intel’s commitment to HPC innovation, its progression from the Petascale age to the Exascale age of supercomputing, and some hard specifications for its highly parallelized MIC architecture aimed at enterprise markets.

Back in 2007, we wrote that Larrabee was initially designed as a discrete graphics engine and was also capable of computing highly parallel applications while preserving x86 programmability. In May 2010, Bill Kircos, Intel’s Director of Product and Technology Media Relations, announced that the Larrabee project would never materialize as a discreet GPU part and would instead be transitioned into a new architecture leveraging both Larrabee and Intel’s many core research projects.

During ISC 2010, that architecture soon came to be known to the HPC crowd as Intel MIC (Many Integrated Core). In its official announcement, Intel outlined plans to ship a MIC development kit platform to select customers known as Knights Ferry. According to Slide 34 of Skaugen’s keynote presentation, Knights Ferry is an x86-based design with 32 cores on a single chip, each with four threads, a 32KB L1 instruction cache, a 32KB L1 data cache, and a 256KB L2 cache. In total, the chip has 8MB of shared L2 cache, which some analysts note to be an interesting design point as many high-parallel applications do not require such a large on-chip cache.

Each processor has a very wide 512-bit vector unit allowing 16 single-precision floating point operations to be computed in a single instruction, with double-precision floating point operations yielding half throughput value.

Although the Knights Ferry development kit looks very similar to the outline of GPU, we are reminded to mention that it isn't a GPU because it has x86 cores. Besides, Intel would never do such a thing. Nevertheless, the card comes with a dual-slot heatsink, features up to 2GB of GDDR5 memory, and connects to a standard PCI-Express 2.0 motherboard slot. Intel advertises MIC as an “Intel Co-Processor Architecture,” so by nature it can become drop-in compatible with an Intel Xeon chip without the need to reprogram application code in another language.

HPCwire.com has published a detailed architecture comparison between Intel’s Knights Ferry based on MIC architecture and Nvidia’s Tesla products based on Fermi architecture. As noted by Michael Wolfe, Slide 33 from Skaugen’s keynote presentation depicts the Knights Ferry architecture layout with remarkable similarity to the 2008 SIGGRAPH article describing Larrabee.

Figure 1: Schematic of the Larrabee many-core architecture: The number of CPU cores and the number and type of co-processors and I/O blocks are implementation-dependent, as are the positions of the CPU and non-CPU blocks on the chip.

Given the fact that Knights Ferry is not a commercially available product, it remains unclear whether or not it has similar design aspects to Knights Corner, the first MIC product Intel plans to launch. According to official plans, it will be manufactured on the 22nm half-pitch process node, will contain over 50 cores, and will be released sometime in 2011. All in all, we expect the next 16 months in the HPC sector to hold many interesting application performance competitions among Intel, AMD and Nvidia. While Intel boasts its x86 instruction set as a provider of maximum compatibility with existing applications without need for dual language programming on processor and co-processor, AMD and Nvidia focus their efforts on maximizing floating point throughput using a heterogeneous combination of CPUs with their Evergreen and Fermi GPU architectures.