Nvidia, Micron Debut High-End Processors

SAN JOSE, Calif. — Nvidia and Micron announced new high-end co-processors at the opening of Supercomputing 2013 today. Meanwhile, an online analysis said the next generation of Intel's Xeon Phi, which competes with them both, will be an integrated device.

Supercomputers have become a key testing ground for massively parallel co-processors because they typically are made of clusters of the most powerful chips available. Nvidia's new Tesla K40 is likely to have the biggest impact in this space. The graphics chip vendor dominates the Top 500 systems where its GPUs are used in 38 of 53 systems employing accelerators, thanks in part of the maturity of its Cuda programming environment.

Nvidia claims the Tesla K40 GPU provides a 40 percent boost over its existing chip and supports 12 GBytes GDDR5, twice as much memory as the prior GPU. The K40 packs 2,880 cores to deliver up to 4.29 teraflops single-precision and 1.43 teraflops double-precision peak floating-point performance. The chip also uses PCI Express Gen3, doubling I/O performance of the PCIe Gen2 links on the previous part.

The K40 is available now and is expected to appear in high-performance servers from Appro, Asustek, Bull, Cray, Dell, Eurotech, Hewlett-Packard, IBM, SGI, Supermicro, and Tyan. Engineers can try out the chip for free on remotely hosted clusters.

Intel's Xeon Phi is quickly gaining ground in high-end systems, finding sockets in 13 Top 500 systems in the latest rankings, including Tianhe-2, the world's largest supercomputer. The next generation of the chip, called Knights Landing, is expected to use a new custom core to work as a standalone chip rather than a co-processor, according to an analysis published today.

Knights Landing is a 14nm version expected to ship in late 2014 of Intel's current Knights Corner version of Xeon Phi. The chip "will [be] a bootable device, in contrast to Knights Corner which must be attached to an x86 server CPU via the PCI-E slot," said David Kanter, principal of Real World Technologies, in his online post.

"The 14nm node should deliver a substantial increase in density and modest gains in power efficiency," said Kanter. "The instruction set is moving closer to the mainstream x86 CPUs," adopting the AVX3 instructions of Intel's next-generation Core architecture (called Skylake) rather than the current 512-bit vector instructions, he added.

Kanter said he expects Intel will use a new custom core insight Knights Landing. Intel has not yet disclosed "the microarchitecture, core count and fabric" of the chip, he said.

Separately, Micron announced a non von Neumann architecture it calls the Automata Processor. It aims to compete on a wide range of high-performance tasks with co-processors such as GPUs and high-capacity FPGAs.

Automata uses a new approach to parallel programming Micron claims has applications such as bioinformatics, video/image analytics, and network, which use large amounts of complex, unstructured data. The chip uses "a computing fabric comprised of tens of thousands to millions of processing elements interconnected to create a task-specific processing engine," Micron said in a press release.

Several academics are working with Micron on Automata. The device "offers a refreshingly new way of solving problems that is very different from all other accelerator technologies," said Srinivas Aluru, professor of computational science and engineering at Georgia Institute of Technology, speaking in the release.

Next year, Micron will release graphic design, simulation tools, and a software development kit for Automata. Many startups, including CogniMem Technologies, have launched novel parallel processing architectures, but many did not get market traction due to difficulty programming them.

This could be a hugely important product for Micron. If they can execute smartly, every cellphone and tablet will have at least one Automata Processor for image and speech recognition. Eventually every robot will have dozens.

From our related work we learned that building logic on DRAM processes is really really cheap. The AP should have a cost of around $2.

A number of WEB commenters have mentioned Venray's CPUs on DRAM. We are doing Big Data analitics for really large datasets. Despite some headlines, this is not what Micron is doing.

The Automata Processor is really first rate innovation. We applaud their efforts.

I always wanted a super computer just so I wouldn't have to wait so long to get what I wanted done. The problem with today's computers is that they spend too much bandwidth on things the user doesn't care about. It is a software problem for me. I know that there are many things that need a super computer, but they should also concentrate on the task and not on all the background things that computers do that are not useful.

At the SC13 event today, Intel said it will use memory in package for its Knights Landing, the next version of Xeon Phi. It it not saying which technique or how much, but it will support multiple programming models.

Part of the reason they are goign into this are memristors. They are going to anounce a memristor chip and memristors are supposed to fit very well mixing memory and logic. So Mircon wants an early start.

As was mentioned, this idea is old. Logic-in-memory architectures, e.g. PEPE, were around in the 1970s, the idea of putting processors on DRAM chips as well. The practical issues include the few metal layers and DRAM transistors optimized for yield, not speed, and the resulting SIMD-style architectures being difficult to program for most applications. I would vote for TSV 3-D packaging being an easier way to go. Take one of Micron's forthcoming DRAM stacks and stick it on a processor array chip.

People have been working on parallel computing for a while...with not that much to show for it despite using multiple cores in many processing products...any indication why the tsunami you are refering to will happen? Kris

Software will be a key enabler, you are correct. Lots of work has been done on creating algorithms that take advantage of massive numbers of processors and now with multiple serious entrants to this segment expect significant software advances too. Could be that the next wave of programmable logic coming at us is actually a massivly parallel tsunami...