Intel's HPC director James Reinders is cleaning out his desk and Intel after 27 years.

Reinders is accepting the firm's offer of early retirement package which appears to be popular now that Intel is downsizing.

Reinders joined Intel in 1989 to work on a VLIW (Very Long Instruction Word) processor called iWarp. It was the early days of parallelism rather than faster clock rates and he was on the cutting edge.

Intel's work on parallelism eased back when clock rates surged again with the 486 and Pentium processors. He found himself arguing for concurrency as well as for Intel's compilers, libraries and other software development tools.

Unfortunately for Reinders, Chipzilla’s general-purpose GPU and accelerator project, codenamed Larrabee turned out to half not be and never made it to market. Bits of Larrabee ended up in Intel's MIC (Many Integrated Core) concurrent processor, which became Xeon Phi and was released in 2012.

Reinder's departure is a significant loss for Intel, because he was jolly good at explaining the benefits concurrency to the great unwashed. He apparently has not made any plans for new work so if you need a HPC expert he is your man.

High Performance Computing FirePro with two Fiji GPUs and passive cooling

AMD has unveiled a new FirePro series High Performance Computing (HPC) graphics card which is based on two AMD Fiji GPUs, similar to the recently unveiled Radeon Pro Duo desktop dual-GPU graphics card, the FirePro S9300 X2.

With that in mind, the new FirePro S9300 X2 targets HPC applications that need maximum single-precision FP32 compute performance, including deep neural networks, machine learning, geoscience, molecular dynamics, data processing and analysis and development platforms for Exascale computing. Based on two fully-enabled Fiji GPUs, each clocked at 850MHz, the new FirePro S9300 X2 provides 13.9 TFLOPs of single-precision compute performance. It also comes with 4GB of non-ECC High Bandwidth Memory (HBM).

AMD lowered the GPU clock in order to keep the TDP at 300W, which also allowed it to pair the FirePro S9300 X2 with a passive cooling, something that is really important in the HPC market.

According to AMD's own testing, the new FirePro S9300 X2 is is up to 3.5x faster than Tesla K40 and up to 2x faster than Nvidia's flagship Tesla K80. It is also significantly faster than AMD's Hawaii-based FirePro S9170, which will still remain the fastest double-precision compute performance FirePro graphics card in AMD's lineup.

AMD's new FirePro S9300 X2 will be available in the second quarter of this year with a price set at US $5999.

AMD shared its glorious five year long GPU and APU roadmap with the world for the first time.

AMD's Junji Hayashi revealed AMD's cunning plan at a special event in Japan, according to WCCFtech.

As we expected AMD's upcoming ARM K12 and its sister x86 CPU core Zen got a mention AMD wants to develop and introduce both x86 and ARM powered SOCs to the market in a pin for pin compatible platform code named SkyBridge.

According to the latest reports, AMD's upcoming K12 and Zen will support multiple threads. Basically this means that AMD will move away from clustered multi-threading used on Bulldozer-based processors and move to simultaneous multi-threading instead. This is good news on more than one front.

Simultaneous Multi-Threading

Simultaneous Multi-Threading (SMT) will allow AMD to boost utilisation and efficiency. With SMT, underutilized parts of the CPU core can be put to use via a secondary execution thread. The end result is superior resource utilization.

This is a contrast to clustered multi-threading which shares resources between two different CPU cores, instead of doing it inside a single CPU core.

AMD never really commented about the how much multi-threading support its upcoming cores would have. Now it seems that AMD's K12 ARM core will support "many threads" instead of just supporting one additional thread as is the case with Intel's high performance CPUs.

When it comes to GPUs, Hayashi said AMD will be employing a two year cadence to updating its GPU architecture inside APUs.

AMD GPU Roadmap 2015-2020 AMD will introduce Accelerated Processing Units with updated GPU architectures once every two years.

AMD plans to introduce what it described as a High Performance Computing APU or HPC for short. This APU will carry a sizable TDP between 200 and 300 watts.

APU loves HPC when it is stacked

This sort of APU loves HPC. So far powerful APUs were not attempted because the amount of memory bandwidth required to keep such a powerful APU fed was too much.

Stacked HBM (High Bandwidth Memory ) will make such designs extremely effective. The second generation of HBM is nine times faster than GDDR5 memory and 128 times faster than DDR3.

Code names for future GPU architectures unfortunately were not revealed. But it is pretty likely that AMD's upcoming GPU architecture to debut on 16nm FinFET will be code named Arctic Islands. We'll know more details in May during the company's scheduled Financial Analyst Day event.

DDR3 has been with us for quite some time and this is not going to change overnight. Usually AMD goes after the next generation memory first and then Intel follows suite, but from what we have gathered so far Intel is taking the initiative and jumping on the DDR4 bandwagon in 2015.

A recently leaked Intel HPC roadmap contains some info on Knights Corner (Larrabee successor Ed.) that is a 22nm chip capable of 1.01 Tflops and comes with GDDR5 memory in a PCIe interface. The successor, codenamed Kings Landing, is a 14nm part that should reach 3+ Teraflops. Kings Landing is a 14nm chip that comes in a socket or PCIe card, supports PCIe 3.0 as well as AVX 3.1 and DDR4. According to the roadmap Broadwell again 14nm chip also shifts to 2015 making it the first slip in Intel’s tick-tock roadmap in years.

Intel is getting serious about high-performance computing and it always wanted a piece of this market, but AMD and Nvidia won't leave this market without a good fight. It turns out that graphics cards are great for the compute market and Nvidia with Tesla based GK110 is already taking advantage of this fact. AMD is also using its GPUs to do some serious computing.

The same roadmap mentions Skylake in 2015 or possibly later and this chip should come with AVX 3.2, 14 nm architecture DDR4 and PCIe 4.

AMD has launched its latest, and according to AMD, the industry's most powerful server graphics card aimed at high-performance computing (HPC) workloads and graphics intensive applications.

The FirePro S10000 is based on two Tahiti GPUs, features a total of 3584 cores and a 825MHz GPU clock, slightly lower when compared to the previously released FirePro S9000 based on a single Tahiti GPU. The two Tahiti GPUs provide up to 5.91 TFLOPS of peak single-precision floating-point performance and up to 1.48 TFLOPS of peak dual-precision floating-point performance.

As was the case with the FirePro S9000, the new FirePro S10000 also features 6GB of GDDR5 memory paired with a 384-bit memory interface, but this time around it has a memory bandwidth of 480GB/s. It uses PCI-Express 3.0 interface and comes with four DisplayPort and single DVI output.

Although it provides 1.48 TFLOPS of dual-precision and 5.91 TFLOPS of single-precision floating-point performance, the FirePro S10000 has a TDP set at incredible 375W. The new FirePro S10000 graphics card is cooled by triple-fan dual-slot cooler.

According to AMD, the new FirePro S10000 can brag to be the most powerful dual-GPU server graphics card ever created with increased performance-per-Watt. It also comes with DirectGMA support that removes CPU bandwidth and latency bottlenecks by optimizing communication between both GPUs and has full OpenCL support.

The new FirePro S10000 will be available as a retail and OEM part, but the price has not been announced yet.

Although Intel's Larrabee project was scraped, or shelved to be precise, it appears that Intel has at least used some of the experience gained in Larabee development. The compny finally announced Knights Corner, a Many Integrated Core (MIC) multi-core computer architecture co-processor, that will be a part of Intel new Xeon Phi line.

According to first details released by Intel, the general specifications of the Knights Corner are a single PCI-Express card that will feature over 50 x86 cores made with 3D Tri-gate 22nm manufacturing process and at least 8GB of on-board GDDR5 memory. Knights Corner will provide 1TFLOP of double precision performance. Unfortunately, those are the only details that Intel is ready to share for now, but we guess that we will surely hear more about Xeon Phi.

Although it borrows some desing details from the Larrabee project, Knights Corner is solely focusing on High Performance Computing rather than graphical performance. The Knights Corner has a tough battle ahead of it considering Nvidia's recently announced Tesla K20 graphics card capable of providing up to 2TFLOPS of computing double-precision performance, but Intel's x86 architecture, independent Linux operating system that manages each of those cores, provide much atractive platform for develepers than Nvidia CUDA.

Actually, Cray has already announced its Cascade supercomputer that, although currently runs on Xeon E5, will get updated with Xeon Phi as soon as possible.

Although it has revealed some details regarding the MIC architecture and Knights Corner co-processor, Intel actually just launched the Xeon Phi brand, while Knights Corner should be ready by the end of the year.

As we said, the Larrabee project is not dead. We saw a Raytracing demo of what we used to call Larrabee at IDF 2010 and we still have to write about it, but we also have learned that in 2012 the Larrabee story continues.

Knight's Corner is the codename for a 22nm HPC part, something that will likely compete with Kepler and Maxwell GPGPU parts.

Intel plans to put many cores, more than 50 of them and once they get to 22nm. The TDPs will be just right, and Knight’s corner should at least come to market before Maxwell. The reason is simple, Intel plans to migrate to 22nm already in second half of 2011, while TSMC and GloFo won’t be ready by late 2012.

Larrabee as graphics card is still under consideration, but the software and drivers are key issue with it. So after all, the Larrabee fairytale does continue to live.

Someone has asked the “L” question. Intel is still not happy to talk about the L-world but we got a small update about its troubled Larrabee architecture.

Larrabee as a graphics product is dead for now, but in 2012 Intel plans to launch it as high performance computing part. This is something that covers the highly parallel computing market and goes against Nvidia’s Tesla, but there is definitely is nothing new to come in the form of a Larrabee for the graphics market.

With that in mind, Intel is already shipping software development vehicles for developers, and it hopes that more of them will get cozy with Larrabee than CUDA. We do believe that Intel will have a lot of catching up to do, but it also has plenty of resources to go after Nvidia in this market.

Intel still gave us a glimmer of hope as it claims that it’s continuing to investigate the discrete graphics market field, but in the short term. Intel’s VP Dadi Perlmutter made this abundantly clear by stating: “You won’t see Intel getting into discrete graphics.”

Earlier this week, Intel VP Kirk Skaugen released a PowerPoint slide detailing the rich history of Intel’s commitment to HPC innovation, its progression from the Petascale age to the Exascale age of supercomputing, and some hard specifications for its highly parallelized MIC architecture aimed at enterprise markets.

Back in 2007, we wrote that Larrabee was initially designed as a discrete graphics engine and was also capable of computing highly parallel applications while preserving x86 programmability. In May 2010, Bill Kircos, Intel’s Director of Product and Technology Media Relations, announced that the Larrabee project would never materialize as a discreet GPU part and would instead be transitioned into a new architecture leveraging both Larrabee and Intel’s many core research projects.

During ISC 2010, that architecture soon came to be known to the HPC crowd as Intel MIC (Many Integrated Core). In its official announcement, Intel outlined plans to ship a MIC development kit platform to select customers known as Knights Ferry. According to Slide 34 of Skaugen’s keynote presentation, Knights Ferry is an x86-based design with 32 cores on a single chip, each with four threads, a 32KB L1 instruction cache, a 32KB L1 data cache, and a 256KB L2 cache. In total, the chip has 8MB of shared L2 cache, which some analysts note to be an interesting design point as many high-parallel applications do not require such a large on-chip cache.

Each processor has a very wide 512-bit vector unit allowing 16 single-precision floating point operations to be computed in a single instruction, with double-precision floating point operations yielding half throughput value.

Although the Knights Ferry development kit looks very similar to the outline of GPU, we are reminded to mention that it isn't a GPU because it has x86 cores. Besides, Intel would never do such a thing. Nevertheless, the card comes with a dual-slot heatsink, features up to 2GB of GDDR5 memory, and connects to a standard PCI-Express 2.0 motherboard slot. Intel advertises MIC as an “Intel Co-Processor Architecture,” so by nature it can become drop-in compatible with an Intel Xeon chip without the need to reprogram application code in another language.

HPCwire.com has published a detailed architecture comparison between Intel’s Knights Ferry based on MIC architecture and Nvidia’s Tesla products based on Fermi architecture. As noted by Michael Wolfe, Slide 33 from Skaugen’s keynote presentation depicts the Knights Ferry architecture layout with remarkable similarity to the 2008 SIGGRAPH article describing Larrabee.

Figure 1: Schematic of the Larrabee many-core architecture: The number of CPU cores and the number and type of co-processors and I/O blocks are implementation-dependent, as are the positions of the CPU and non-CPU blocks on the chip.

Given the fact that Knights Ferry is not a commercially available product, it remains unclear whether or not it has similar design aspects to Knights Corner, the first MIC product Intel plans to launch. According to official plans, it will be manufactured on the 22nm half-pitch process node, will contain over 50 cores, and will be released sometime in 2011. All in all, we expect the next 16 months in the HPC sector to hold many interesting application performance competitions among Intel, AMD and Nvidia. While Intel boasts its x86 instruction set as a provider of maximum compatibility with existing applications without need for dual language programming on processor and co-processor, AMD and Nvidia focus their efforts on maximizing floating point throughput using a heterogeneous combination of CPUs with their Evergreen and Fermi GPU architectures.