Tera-Scale Computing

Tera-Scale Computing

Intel Developer Forum opened today with talks about Intel's next leap into Tera-Scale computing. Intel President and CEO Paul Otellini revealed a new research prototype processor that has 80 floating point cores all packed in a single 300mm² die. The prototype chip is capable of achieving a Teraflop of performance.

According to Intel Senior Fellow and CTO Justin Rattner, the rise of mega data centers delivering services such as photo-realistic games, share real-time video and multimedia data mining will challenge the industry to deliver TeraFLOPs of performance and Terabytes of bandwidth.

Although still in the early stages of research, the prototype chip will allow Intel to study the architecture of the multi-core chip and test interconnect strategies between chips and between cores and memory. "While any commercial application of these technologies is years away, it is an exciting first step in bringing tera-scale performance to PCs and servers," said Rattner.

What's interesting in this prototype design is that each of the cores are actually connected to a separate SRAM memory chip, making it possible for the entire chip to deliver more than a terabyte-per-second of bandwidth. This is achieved by stacking a memory die fashioned in the same 8x10 block array as the cores. The memory die is then bonded to the processor die through thousands of interconnects. Think of it as a flip-chip bonding process, but only that this time it's done on top of another chip. In order that the entire stacked chip makes communication to the outside world, deep through-hole vias will be fabricated on the memory die, so that a second flip-chip connection can be made to the chip package.

Besides that, each of the cores will also be equipped with an internal router that moves data between cores for load balancing purposes. According to Intel, these routers will have built-in intelligence, so much so that if it detects an overworked core, it will not route workloads to the core and if a core within the array failed, it will intuitively route work to spare/redundant cores, so that the tera-scale chip will continue to perform within expectations. Some of these ideas will be investigated with the prototype although it's likely that most of them may actually end up in the final design.

Taking a broader overview of the tera-scale chip, we can see very familiar ideas being thrown into the prototype. Here we have built-in memory controllers for fast local memory access and chip-to-chip communications. Both of these technologies are already in existent today with AMD's chips although they are far less advanced than what is being tested by Intel. However, what's important to note is that Intel is recognizing the fact that in order to move huge amounts of data, either to memory or to other cores, such an architecture is inevitable. However, it is likely that Intel will only start to embrace this with multi-core chip designs.