AMD has made a huge mistake by continuing to back its Torrenza initiative, which provides socket support for add-on cards. While the idea has sounded intriguing, recent advancements made by Intel’s Tera-Scale Project have prompted many to reconsider AMD’s goals. The idea of requiring an external socket for parallel compute abilities seems something less than desirable when compared to the elegant design of Tera-scale.

I’ve spoken at length with Jerry Bautista, director of the Tera-Scale Project at Intel. He shared with me much of the excitement and enthusiasm that I did not see in Tera-scale when it was first announced. I was one of Tera-scales biggest detractors. I thought Intel had really lost its way by advertising a test chip comprised of 80 cores computing over a teraflop sustained. But I was wrong. Tera-scale is an amazing design. It is so forward thinking that I believe it will be the future of computing. If not the actual project itself, the base design that the Tera-Scale Project researchers are working on will be the foundation.

Tera-scale began its life by looking at what’s desirable from a software point of view, and then backing into the hardware necessary to make it happen. The project uses a fundamental communication routing system to allow the many cores to speak independently to each other. Bautista calls this a “one to any” communication methodology. The 80 cores are arranged in 10 nodes of 8 cores each. Each core can communicate directly to its immediate neighbor in four directions. There are also two additional lanes of communication, which are logically considered to be up and down, which reach out to main memory (which, in Tera-scale is stacked in 3D), and to extra-node communication with other cores.

Tera-scale also provides built-in mechanisms that allow core workloads to be moved around as required. This means a particular core does not have to always compute a particular workload. The workloads can be shifted around for thermal or caching concerns.

The point in me telling you all of this is to indicate that when Bautista’s team began designing Tera-scale, it didn’t really care what processing or compute cores were in each node. The team only cared about getting the cookie-cutter design working, meaning that once they completed a node design. And once they completed a core design, and once they completed the communications protocols allowing each core to communicate with other cores and with main memory, then they simply replicated it many times.

Bautista told us the company could’ve used 40 cores, 160 cores, 500 cores; it didn’t matter. The reason it settled on 80 cores was because it had a transistor budget to work with, and the tradeoff between memory, cache, and processing cores worked out, so that 80 was a good number.

Designs like Tera-scale allow for high-speed extensibility beyond anything imaginable in an external socket design. Lower power components, faster communications, more shared resources, and everything being native to a single die, Tera-scale designs make all of this possible. The downside is the die itself must be created with particular abilities. They cannot be added after the fact without going through one of the system’s main buses, like PCI-e.

Torrenza allows for unlimited extensibility, provided there are hardware makers that are interested in designing and testing products for an open, non-proprietary socket format–one which only works with AMD motherboards or other Torrenza-supporting platforms from third parties.

This announcement from eSilicon that it will provide “services” on the Torrenza platform does bring some additional merit to AMD’s design. However, with the potential for significantly greater speed, less expense, and much higher flexibility in the Tera-scale-like design where individual cores or nodes are constructed to on-die communication protocols, designs like Torrenza seem significantly less than designs like Tera-scale.

AMD made a step forward once it purchased ATI and began to realize what was possible for heterogeneous computing. However, the reality is that as we move forward in time and see the success of projects like Intel’s Tera-Scale Project, efforts like AMD’s are beginning to seem wasteful, unproductive, and an improper use of R&D funds.

Reader Comments

drsengir

I don’t mean to be rude, but your entire article is based upon a very flawed premise: that Tera-Scale and Torrenza are designed to compete against each other.

They are drastically different approaches to solving *different* problems, not even the same problems, so comparing them is much akin to comparing apples to oranges. They are indeed both fruits, and as such are high in fructose and Vitamin C, but the similarity ends there.

Therefore saying that Torrenza is a “huge mistake” because Tera-Scale sounds like a good idea just doesn’t make sense. And I would like to see some references for the assessment that “many [have reconsidered] AMD’s goals”, because as far as I have heard Torrenza never received a whole lot of fanfare to begin with.

The Tera-Scale project will more likely bring about a competitor to the Cell processor and supplant the Itanium as the over-reaching, under-performing, Intel flagship server product. If they play their cards right it may do better than the Itanium, and that would be great, but I’m not holding my breath.

The prototype 80 core Tera-Scale processor that you keep referring to is nothing more than 80 DSP-style number crunchers on one die. That’s hardly a novel idea. In fact, it would make for a great peripheral computing device on a Torrenza network, though it won’t be much use for playing Doom.

The communications layer is the interesting part of Tera-Scale, but not much information is given about that, not even any results of efficiency tests on the prototype (the “teraflops” figure they boast is just the aggregate of the theoretical peaks of the cores, not an observed figure).

I’m really confused about your boasting of the “much higher flexibility in the Tera-scale-like design where individual cores or nodes are constructed to on-die…”. Is it just me, or does the permanence of the resulting die configuration make it sound significantly *less* flexible than the Torrenza approach of allowing you to pop peripheral computation devices in and out at will?

And what is with the twist in your paragraph that starts with “Torrenza allows for unlimited extensibility…” How can you turn that into a bad thing? How is it bad that the standard is open, non-proprietary, and *only* works with AMD motherboard… OR anyone else who supports the protocol. Yeah, I mean that really narrows your choices. It’s like HyperTransport, right, and we all know that nobody used that (said in the most sarcastic voice I can muster).

I don’t know how you can declare Tera-Scale successful when they have one prototype comprised of 80 bare FPUs and no hard data regarding any of the inner-workings of the platform, and there are no third parties lining up to support this new platform yet. Similarly I don’t know how you can say that Torrenza is a failure, considering it has some third-party support, works with existing technology, and promises to bring something to the table in a couple of years.

The projects are both interesting, and seeing as one is short-term, the other is long-term, and one is designed to work to extend the capabilities of existing processor paradigms and the other is designed to shift paradigms entirely, I would say that they are not even comparable.

And to think that AMD isn’t working on grid-dies is foolish, considering they announced their research program into that field over a year ago. They haven’t created an 80-core prototype, and they certainly don’t have billions of dollars to put into research, but they are working on it.

RickGeek

drsengir, Tera-scale is a design. In my opinion it is *THE* design which will win out in the end. It’s *THE* solution which allows on-die integration of disparate compute components, processor cores, memory infrastructures, and everything else. Tera-scale provides the necessary framework to support any computing models. It’s just not about the processing cores. In fact, it’s hardly about them. It’s about the supportive framework which allows whatever happens to be plugged into it for computing, to compute.

Torrenza is a socket extension. It allows anything that’s compatible with the socket to be introduced. It will have increased latency for communicating with any other device and, no matter how small the actual plug-in die might be, it will still require the full size ofa socket.

Torrenza seems to be a waystation on the road to Fusion. It is designed to allow socket support for add-on co-processors. And, with all due respect to AMD, I don’t see why that’s not possible now. Why couldn’t someone design a co-processor which sits in an AMD socket, but isn’t an x86 computing engine? All it would have to do is have the external framework necessary to communicate via HyperTransport, respond to memory requests, etc.

Tera-scale’s design implements everything necessary to allow anything to be plugged into it. There can be specialized compute cores, even dozens of them. It doesn’t have to be the 80-identical-core design that the prototype had. In fact, in the future that will be a most undesirable use of such a machine. Many more specialized compute components will be present, making it more of a hybrid between SoC and heterogeneous multi-core computing.

Do I think AMD is working on it? Yes. Do I think they have the money to complete the project? No way. They have $5.5 billion to pay back in 2012. That will cost them $250 million per quarter if it’s averaged out. That’s $250 million right off the top before any other profits are taken.

AMD is near bankruptcy, especially if they cannot get Barcelona to clock any higher. We don’t even have 45nm Penryns from Intel yet, and still Intel’s highest-end 65nm products can at least compete with Barcelona, if not beat it, in every measurable way.

AMD’s problems are their biggest problem. Problems with being late, problems with having a 2.0 GHz processor when we expected a 2.5 GHz model. Problems with their IBM-based high-k/metal gate dieelectric which may not even be there at 45nm. Intel already has their Gen1 high-k/metal gate hafnium solution in use. And Gen2 will be coming.

Intel is beating AMD on all fronts right now. And they have a huge number of development teams working on Tera-scale right now, more than 100 separate teams I’m told. Tera-scale has already been proven to work. It does work. It is significant. It is real. And it is a better design. It allows for anyone to design a core of some kind to the Tera-scale communication procotols and, provide they adhere to those protocols, it will just work and it will be low power, low budget, leverage existing resources for memory and other on-die communication systems. There are simply too many advantages to an on-die design for them to be ignored.

Look at the power budgets we’re seeing with dual and quad CPUs compared to single-core CPUs. The same thermal envelope, yet two or four cores. That should tell you the advantages of power savings, if nothing else does.

drsengir

I agree, the interesting part of TeraScale is the communications layer, but Intel’s website has little-to-no useful information about that, just very basic and uninteresting information. They prefer to talk about their 80 core number crunching prototype.

I agree entirely on the socket issue. I was under the impression that HyperTransport was flexible and extensible, so I was surprised to see the Torrenza initiative at all.

“Plugged in” and “TeraScale” do not go well together, because it’s all hard silicon. Also, software resource management will be a bear, particularly if what you say is true and the fabric will be filled with specialized computing cores. Does that make it a bad idea? Certainly not, I think it’s a great idea, and I think it may actually make Intel’s server product line more desirable (particularly considering the Itanium line is terrible).

Also, not “anything” can be designed into the fabric. Only things that Intel designs. It is a closed architecture, after all.

AMD file for bankruptcy? Unlikely. They’re hurting for cash but far from destitute. They also don’t have a $5.5 bil loan due 2012. They have a $900 mil loan due 2011, $1.5 bil and $400 mil loans due 2012, and a $2.2 bil due 2015. The $1.5 bil loan, by the way, was to refinance a previous loan. That’s the way all this fake money works in our society, and “due dates” for loans just means that they have until that date to get a loan to cover the previous loan. Also, AMD has lowered their capital spending guidance by $700 mil to alleviate some of those issues. It will hurt them in terms of technology development, but it will allow them to push through this tough time. I really don’t think bankruptcy is evenly remotely an option for them.

Intel came back in a hard way and is definitely putting AMD through a beating, but hey, a couple of years ago people on this site were sounding the death knell for Intel under massive budget cuts and dozens of failed product launches. It’s true that Intel is larger and as such that gave them more momentum to ride the storm, but AMD plays the part of the underdog well, and I imagine that they will come up with a competitor to TeraScale if need be (they can always do what they did with the x86 and copy Intel’s handiwork, but offer it at a lower price).

TeraScale has been “proven to work” in only the most basic sense. It works. Whoop-tee-do, I can slap together 80 processing elements, put some interconnects between them, and have it “work”. They have not proven that their fabric is nearly as efficient or extensible as they claim that it will be. The prototype is a good start, but the prototype Merceds shocked and amazed everybody for 6 years until they released the Itanium and everyone laughed.

“It does work”. Prove it.
“It is real”. No it’s not, it’s paper and a prototype that according to you doesn’t even begin to touch upon the potential of the technology.
“It allows for anyone to design a core of some kind… [and] it will just work”. Firstly they have to get Intel’s blessing to even get the specs, then they have to design, test, etc the core, then they have to somehow convince Intel to cut new masks (at a cost of many millions of dollars) and rent time in Intel’s fabs to tape out the die. And that is all assuming that Intel will even do custom orders, which is still not set in stone. There is no “plugging in” like with Torrenza’s socket format.

I’m not arguing the benefits of massively parallel heterogeneous integrated processors. I think that they are the way of the future. I have thought that for years. But that future is still a bit far off, and when it does first arrive it will make it to the high-end server and supercomputer niche only. I think Torrenza will act as a good consumer-level extensible processing platform until process technologies and software advancements allow the higher-end stuff to trickle down into consumer space.

And yes, the power budgets for 2/4 core CPUs is comparable to those of previous single-core CPUs, but that’s because those single cores performed faster than any one of the cores on a 2/4 core CPU. Not faster enough, mind you, because thermal dissipation grows exponentially with clock speed once you push the limits of the process technology. So, yes, I agree with you entirely that massively parallel systems are the future (ironically they are the past as well, but I won’t delve into that), but I don’t think the industry can quickly or easily transition from the low-thread-count x86 paradigm to an ultra-parallel non-x86 paradigm. It will take time. And a lot of it.

RickGeek

drsengir, you are quite correct. There is very little additional public information. We’ll have to wait to see what comes of it.