Intel Previews Tera-scale Technologies for Upcoming Products

Massive data sets, massive processing, massive bandwidth

We have been talking about tera-scale
technologies since 2006 when it comes to Intel research programs.
The name is perhaps more grandiose than the actual idea: as data sets
increase in size the need for computing technologies to handle this
amount of data will need to be created. It is no secret that the CPU as
it exists today simply can't handle the massive amounts of parallel
information that will soon become normal operating procedure. NVIDIA
and AMD will tell you that their GPUs are lined up well to address this
problem, but Intel is thinking even beyond that. At this week's
International Solid State Circuits Conference (http://www.isscc.org/isscc/index.htm)
in San Francisco Intel will be presenting some research papers on how
to address these concerns and we wanted to give you an early preview of
some of them.

While we know that tera-scale processors will likely have many
processing cores on them, a pitfall of this design is how the chips
communicate with each other internally on the die. Intel is hoping that
the research they are doing in data sharing among many cores will lead
to a definitive solution. The one time dubbed "Single-Chip Cloud
Computer" (we actually wrote about it in
December 2009) utilizes a mesh network and message buffering system
to pass data from core to core without direct connections between ALL of
the cores. This method allows data to transfer between cores as much
as 15x faster than when using main memory.

The problem of course is that this method introduces a bit of
latency and requires each "router" to have a non-trivial amount of
storage for data handling. The current iteration of the
packet-switching technology is currently up and running with 24 2.0 GHz
routers and 48 1.0 GHz IA cores for a total of 2.0 Tb/s of bisectional
bandwidth.

The other method for this network-on-a-chip in the image above
uses a circuit rather than routers - the benefits include the removal of
packet delays and improved power efficiency. The complexity increases
of course to make this all work in a hardware form and thus is just
starting prototyping deep inside Intel's labs.

Another potential problem for tera-scale processing is not just
internal communications but external communications for multi-chip
systems or even component communication. Imagine trying to have
multiple tera-scale processors on the same motherboard attempting to
pass information at rates similar to the internal architecture. A
method that Intel is testing uses a direct chip-to-chip connection that
does NOT go through the CPU socket and/or motherboard design. Why?
Intel says that by using this direct connection the high-speed
communication can be accomplished with an order of magnitude improved
power efficiency. Intel seems to think they can reach a terabyte/sec of
bandwidth with just 11w of power as opposed to the 150w previously
thought necessary.

Intel has been able to test and verify as much as 470 Gb/s of
chip-to-chip communication using just 0.7w of power - an incredibly
impressive feat. Not only does this method improve efficiency but it
also allows the power consumption to drop to just 7% of normal value
while in sleep mode and it "wakes" as much as 1000x faster than today's
options.

While interesting, this efficient I/O communication would likely
come at the expense of interoperability and flexibility of system
designs. A system built with this kind of direct chip-to-chip
connections (instead of socket-based connections) might limit options
down the road much like integrating memory controllers on to CPUs has
done for motherboards, etc.

Intel has an interesting scenario built around optimizing task
processing for many-core processors based on frequency and leakage
variations on a per-core basis. Imagine an 80-core processor: not all
cores are going to be able to run at the same frequency reliably. In
traditional chip design thinking, Intel would lower the performance of
ALL dies to match the worst case scenario in order to keep the product
reliable and up to Intel's standards. What if you let each core run at
its own theoretical maximums and instead managed the threads and tasks
independently?

Intel is looking at "thread hopping" technologies that would put
priority on certain threads and tasks and place them on the cores that
better suit the overall system. The cores that can run the fastest
would be loaded with the highest priority tasks and as those complete
the threads would move from slower cores to the faster ones (shown in
red in our image above). If the system would like to run in a more
power efficient model the CPU could map tasks only to those cores that
exhibit the least amount of leakage; there are lot of directions this
idea could take. Intel claims that a CPU could save anywhere from 6-35%
of its energy consumption by mapping work to the best set of cores for
each task.

Finally, the last option Intel discussed with us today was the
idea of having processors adapt to extreme conditions. In what could be
described as more aggressive form of Turbo Mode, imagine a processor
that is not tuned for the "worst case" as they are today but instead
will assume the best or nominal operating conditions. Instead of trying
to prevent errors from occurring the chip would be built to look for
these potential errors and problems, detect and handle them and then
change the operating parameters as needed.

If, for example, a CPU notices a voltage error it would drop the
frequency and then "replay" the operation to get the necessary result
without the error. Obviously this needs some very complete monitoring
utilities on-chip, but Intel started this trend with on-die power
monitoring of the Nehalem architecture last year. It would also require
a much more robust system of data monitoring in order to enable the
"replay" option. Intel does think this method could offer either a 40%
performance improvement or a 21% energy use reduction.

All of these technologies are like years away from any real-world
integration, but seeing a preview of what processors might be like in
the future is always intriguing.