Intel Massive Parallel Upgrades Due

PORTLAND, Ore. — Intel unveiled its many-integrated core (MIC) roadmap to expand the high-performance computer market at the International Supercomputer Conference (ISC-14) in Leipzig, Germany, June 22-26. The multi-faceted unveiling revealed the details of a new version of its massively parallel processor -- the Xeon Phi -- as well as a new interconnection fabric based on Intel's silicon photonics advances, and an educational program designed to give every new programmer on the planet the opportunity to learn how to code for parallel processors.

"In just 16 years, we've seen the fastest supercomputer in the world at 3 teraflops migrate down to a single socket," Charles Wuischpard, vice president and general manager of Intel Workstations and High Performance Computing Data Center Group told EE Times in a conference call.

Intel has been talking about a new version of its massively parallel Xeon Phi processor -- currently with 60 cores per chip -- but at ISC-14 unveiled many more, but not all, of the details about the new chips, which will be available in the second half of 2015. Its current-generation Xeon Phi is a 1-teraflop chip cast in 22 nanometer CMOS and sold on a PCIe board in several versions. The Green500 list pronounced it the most power efficient parallel processor in the world. The Top500 supercomputer list just announced the Xeon Phi powered Tianhe-2 (Milky Way 2) supercomputer at the National Supercomputing Center in Guangzhou, China as the fastest in the world for the third time running.

"We have a new Xeon Phi processor coming in the future called Knights Landing," said Wuischpard. "The first thing to note is that it will be 3 teraflops in a single package, which will be available in the second half of 2015, with at least as many processors, but based on the Silvermount architecture and connected by a low-latency mesh."

Intel predicts that high-performance computers will grow at a rate of 20% per year as prices drop, inducing more segments to purchase them. Click here for larger image.
(Source: Intel)

The previous MIC processor -- called Knights Corner -- was based on a special microarchitecture created just for it, but the new-generation Knights Landing Xeon Phi will be based on Silvermont modified for Intel's 14 nanometer process. The Silvermont architecture has also been heavily modified to add key features, such as a AVX512 vector processor and four threads per core, versus the current Silvermont, which has no AVX512 support and just one thread per core.

"The next generation of Xeon Phi processors will get more than just a processor speed-up. It will feature more than just a faster processor, but more integration of memory and greater processor power efficiency," says Wuischpard.

Other details provided by Wuischpard claimed three times the single-thread performance of the existing Xeon Phi -- 16 Gbytes of on-package memory connected by DDR4 to a special Micron Hyper Memory Cube designed in collaboration with Intel. Wuischpard also claims the Silvermont Xeon Phi will take up one-third of the space, be five times more power efficient, and yet binary compatible with the existing Xeon Phi.

The slide states Xeon binary compatibility, but the article states Xeon Phi binary compatibility. Since AVX-512 is not compatible with LNI, I suspect the slide is correct and the quote (or the quotee) is not.-BitHead77

Sorry it took so long to get an answer to this one, but when you see how complicated it is you'll understande why.Here are all the details:

The current Intel® Xeon PhiTM coprocessor (Knights Corner) is not binary software compatible with other processors. The unique combination of new features put into Knights Corner at the time, and the software stack for Knights Corner, prevents complete compatibility between Knights Corner and other processors. However, current code developed on the current Intel Xeon Phi can generally be ported to Knights Landing and Intel Xeon processors with a recompile. For this reason, we do think that for best results, customers should get started today to modernize their code with the current Intel Xeon Phi coprocessor so as to prepare for the coming advances on Intel® Xeon® and Intel Xeon Phi (Knights Landing).

Knights Landing, however, is software binary-compatible with Intel® Xeon® Processors—specifically the Intel Haswell Instruction Set with the exception of TSX (Intel® Transactional Synchronization Extensions). The same binaries will run on both Knights Landing and Intel Xeon processors. This enables customers to readily leverage their legacy code, simplify their code base, and use the same parallel optimization techniques (cores, threads, vectors) to benefit both Intel Xeon processor and Intel Xeon Phi processors. This will deliver the most performance for the least developer investment.

This is quite true that processor manufacturer are going towards massive parallel processing, and multi core processors, but at the same time applications supporting massive parallelism should also come up, Intel's initiative like "an educational program designed to give every new programmer on the planet the opportunity to learn how to code for parallel processors", is really a good thought and step.

The slide states Xeon binary compatibility, but the article states Xeon Phi binary compatibility. Since AVX-512 is not compatible with LNI, I suspect the slide is correct and the quote (or the quotee) is not.

There are also at least two separate levels of programming that may or may not need to adapt. The operating systems control the resources provided by the hardware and parcel them out to applications. The current abstractions provided by those operating systems lean heavily on processes and threads within those processes. Applications typically use these abstractions and rely on the OS to map them into hardware appropriately. The techniques that you list, @Tony, could be very useful in terms of giving the OS more latitude in terms of these assignments, but they still need a synchronization mechanism (like the Ada rendevous concept). I sympathize with @Colin's comment that more programmer training is needed, but a good model for applications to follow would make that training more effective. The OS and tools guys need to figure that model out.

Well, parallel processing is so easy, there's so many ways to do it :)

Actually, that's the problem: from what I've seen, there is no one approach that is best suited for all problems. And, I think it's been pretty well proven that most software developers have a hard time writing bug-free and high performance code using "traditional" techniques such as threads/locking/semaphores.

So it's not surprising there's a movement torwards functional programming and "shared-nothing" programming (message passing, actors/Erlang-model, etc). However, depending on how much has to passed around, that might not be the best approach.

Was there any discussion of software support for these beasts? Operating systems have supported multiprocessors for a while now, but it doesn't seem like they have really made the best use of them. Much of their support seems to be how to throttle down to the minimum number of active cores needed for the applicaiton load. I can see this for server farms that want to provide maximum capability while minimizing power usage, but it seems like with this number of cores we may be needing some fundamental architectural changes in operating systems and / or application software. Is that true or is it just more of the same?