AMD "Torrenza" Technology;CPU-GPU Shared Cache

ParrotJun 2, 2006, 7:26 PM

Quote:

With Torrenza, AMD has designed what it calls an open architecture, based on the next wave of Opteron processors, which allow what AMD calls "Accelerators." Using the add-in accelerators, a system will be capable of peforming specialized calculations similar in fashion to the way we use GPUs today.

I like the idea behind the concept, but it seems to me that this will be years in the making before hardware and software makers get behind it enough for the average consumer to see any benefits. Does anyone know of any "accelerators" out there currently under development?

It sounds great in concept, but in reality I only really see a limited market for these 'accelerators'. Graphics cards have PCI-e and can't max out the bandwidth right now anyways.Physics accelerators might be an idea or perhaps media accelerators?

But in the end it costs more $$$, and most people won't see the value for the cost in it, IMHO. It'll just let you do things faster, so it sounds more like a specialized thing for render farms, etc. Not sure if the average consumer will even care.

Great concept, however, can AMD and OEMs work hard enough in order to develop the technology in a way which would appeal to the consumer market? Personally, I see this technology making its biggest splash in the server/workstation market, where 'accelerators' can be used for a number of purposes in order to aid the CPU in specific tasks like 3D modelling or simulation. I reckon this technology will be very scalable in a server environment as it uses HT-3 and HTX. All in all, its another thing for me to watch in the near future.

I like the idea behind the concept, but it seems to me that this will be years in the making before hardware and software makers get behind it enough for the average consumer to see any benefits. Does anyone know of any "accelerators" out there currently under development?

i think it was mentioned, if you read the article/Q and A that AMD also supports these accelerators. however they all probably have to go through the north/south bridge rather than connect directly to the cpu which is the whole point of it which i think you missed. fanboyism gets you nowhere.

Imagine the bottleneck if all the data had to go through the North/south bridge, instead of directly to the CPU... man, the latencies would hamstring any performance boost.

I like the idea behind the concept, but it seems to me that this will be years in the making before hardware and software makers get behind it enough for the average consumer to see any benefits. Does anyone know of any "accelerators" out there currently under development?

There are several companies usually in the ultra-high performance server space that have or are working on developing optimized co-processors to perform a specific business or engineering need. Some examples are as follows:- FPGA coprocessor to speed up FPGA compile times, - math and floating point co processors, - Cray is developing a co-processor to work with the Opteron to enable greater supercomputing performance.- DRC is working on one as well.- Other accelerators that Torrenza enables and that probably will be built are for XML, Physics Calculations, and a gaming co-processor that is specially designed to boost video card performance.

These solutions may also be designed to be dropped into a Hyper Transport (HTX) slot to get immediate access to the CPU and memory subsystem.

The amount of performance boost that can be achieve by some of these specially designed co-processors is in the realm of 10X to 300X boost so there is plenty of incentive for companies to make lots of money designing solutions that utilize AMD's cHTT spec.

what i wan't to know is if these co-procs are going to have any memory of their own or is it all cache. i mean in any gfx card for instance the memory is always the slowest component compared with the gpu. so are these procs going to just have a connection to the cpu and therefore just be an offshoot of the processor or are they more of a stand alone type thing.

i mean for one reason or another gfx cards don't use all the pic-e bandwidth so how are they going to make it that these benefit more from this connection rather than pci-e. sorry if i'm rambling or not making sense but just curious as to how much of a boost this would have in the real world.

I bet they will have their own high bandwidth cache, since normal DDR/DDR2 memory cannot provide the sort of bandwidth and low latencies required for these specialised co-processors to work without being bottlenecked. Unless DDR3/DDR4 comes round some time soon, I think they will rely on cache.

I'm still thinking years till we see the market for this thing of thing develop and go mainstream, even longer for the software to make the necessary adjustments to actually show clear benefits. Everything that I'm reading on all these possible applications for the Torrenza seem like ideas that are pretty far down the road. "Imagine the possibities" seems to be a common theme, which to me says still under development. We're just now seeing software begin to optimze for dual-core and we're already going dual dual-core? It seems like a marketing technique to bring this tech out this early. The article Parrot linked to stats that they are working with Intel as well and using the Intel Common System Interconnect to avoid the north/southbridge. Although this, along with the rest of this technology we're talking about, seems to me to be a few years away. Don't mean to steal the link Parrot.

still, they will need the info for processing so will they have to go through the cpu to access the system ram or what. also is the HT link the same speed as on the cpu i.e 1ghz full duplex. cause that would appear to be fast and i would hope it wouldn't be wasted on memory fetches.

But think of the latencies involved in transferring data from the system memory to dedicated memory for the co-processor, before the co-processor can work on the data. The latencies involved wouldn't be heavy, but it could be a possible area for bottlenecking. Either way, time will be wasted on memory fetching, whatever the intended design.

And beerandcandy, before you start spouting shnit we already know, here's some facts. Any Add-on card for whatever task brings a whole factor of limitations and potential latency increases. The early days of gfx cards saw this problem, and the early days of PPUs are no exception. A co-processor dedicated in boosting gfx-card performance would work by improving the communication between it and the CPU, and other potential co-processors, even PPUs. How? Because the co-processor would act as a pathway linking the gfx card and CPU using Hypertransport. This would cut down the experienced latencies encountered in rendering the hardware accelerated physics seen in games like GRAW. This would work since the whole system is driven by the system drivers, and not the OS, meaning everything will be a lot more efficient.

Looks great--only that AMD is basing their strategy on other companies by doing this, and that is always always always a risk. If manufacturers don't get stuff ready in enough time, or if Intel gets support for a similar technology before AMD gets support, AMD is in serious trouble.

Now, the fact that Cray is behind AMD on this is a very good thing since they make like the biggest computers in the world. That gives AMD some confidence I'd say, and I definately agree this could be a huge win for AMD if this plays out right.

I bet they will have their own high bandwidth cache, since normal DDR/DDR2 memory cannot provide the sort of bandwidth and low latencies required for these specialised co-processors to work without being bottlenecked. Unless DDR3/DDR4 comes round some time soon, I think they will rely on cache.

So..... unless cache is applied or an extra memory bandwidth is provided, is there a performance hit for adding co-processor???

I bet they will have their own high bandwidth cache, since normal DDR/DDR2 memory cannot provide the sort of bandwidth and low latencies required for these specialised co-processors to work without being bottlenecked. Unless DDR3/DDR4 comes round some time soon, I think they will rely on cache.

So..... unless cache is applied or an extra memory bandwidth is provided, is there a performance hit for adding co-processor???

Well, the co-processor in itself would provide a huge boost in performance in whatever task it's designed to handle. It is just that if the components supporting the co-processor aren't up to scratch, then that performance boost will be decreased. There will still be a performance boost, it will just be limited by the latencies involved in the system unless cache/extra memory bandwidth is provided

Just a thought, but could you use this type of socket (add on card?) with direct HT connection to give a sort of Cache level 2.5 memory? Not actually do any co-processing but buffer the processor level 2/main memory for a large performance boost. I will freely admit I know little about this type of technology but if this were possible it would be a quick, cheap and universally applicable way of reducing memory latency and upping the processor performance.

Just a thought, but could you use this type of socket (add on card?) with direct HT connection to give a sort of Cache level 2.5 memory? Not actually do any co-processing but buffer the processor level 2/main memory for a large performance boost. I will freely admit I know little about this type of technology but if this were possible it would be a quick, cheap and universally applicable way of reducing memory latency and upping the processor performance.

I think you could, but I don't think it would be that cheap. Cache memory is relatively expensive. It would be a great way of expanding real-estate for a level-3 cache or whatever, but I think it would depend on the development of cheaper cache such as Z-RAM (which I know they are working on).

The best use I can think of would be a physics tweaked CPU... I wonder if AMD and Intel will just make some physics extension ala MMX or SSE ? It seems like that would make the most sense Maybe you could use the extra socket for a fast bus for a giant cache ram chip like the old L2 chips back in the Pentium 233MMX days.

The best use I can think of would be a physics tweaked CPU... I wonder if AMD and Intel will just make some physics extension ala MMX or SSE ? It seems like that would make the most sense Maybe you could use the extra socket for a fast bus for a giant cache ram chip like the old L2 chips back in the Pentium 233MMX days.

A physics-tweaked CPU... that sounds good enough to eat. Or perhaps a graphics-tweaked CPU, so to boost rendering times in programs like 3DStudio Max and Maya. Hell, perhaps even an AI-tweaked CPU. The possiblilities involving this is pretty impressive. And a Cache chip? I wonder how that would improve things...

Great concept, however, can AMD and OEMs work hard enough in order to develop the technology in a way which would appeal to the consumer market? Personally, I see this technology making its biggest splash in the server/workstation market, where 'accelerators' can be used for a number of purposes in order to aid the CPU in specific tasks like 3D modelling or simulation. I reckon this technology will be very scalable in a server environment as it uses HT-3 and HTX. All in all, its another thing for me to watch in the near future.

Coprocessors can have various levels of integration:(1) System level-a card in an HTX slot or a module that plugs into a CPU socket.So on a dual CPU MB, one socket can have a CPU and the other a coprocessor. The coprocessor can have its own local memory. DRC already produce such a module for Opteron systems.

(2) Chip level: The CPU die and coprocessor die are in one package that plugs into a MB CPU socket.

(3) Die level: The CPU and coprocessor are on one die.

Possible applications: The coprocessors can be anything from a physics processor to a GPU. At the moment there are coprocessors consisting of an FPGA and local SRAM. These modules can be configured to perform algorithmic acceleration in HPC. Cray, for example, use them in their XD1 Supercomputer.

Advantages: Coprocessors typically provide performance an order of magnitude or more compared to a CPU and have low power consumption.8)

one good thing about this whole thing though is that it has got people talking about AMD without, apart from beerandcandy, a single intel fanboy starting a flame war. it's got to be a record. a thread that has AMD in it and we are managing to have a civilised debate about the possibilties of this.

i think the co-procs are a better idea than memory of any sort. i don't know if it would be that much better than the cache mameory the cpu already has access to. also i think these procs are going to cost more than most are willing or able to pay if or when they make and appearance in the consumer market.

It wouldn't be better than on-chip or on-die cache, but it would have the advantage of having extra real estate to put a possibly large amount of cache on.

It's so much more than what people are talking about on other posts. They find it quite easy to lose their way and fall into the AMD vs. Intel battle which is a useless wate of time.This is the stuff everyone has been waiting for and they don't even know it. I'm loving the idea of more realistic racing sims, and first person shooters that are so real you could get indicted by a Grand Jury if it were recorded and shown to one.We're getting closer.

It's so much more than what people are talking about on other posts. They find it quite easy to lose their way and fall into the AMD vs. Intel battle which is a useless wate of time.This is the stuff everyone has been waiting for and they don't even know it. I'm loving the idea of more realistic racing sims, and first person shooters that are so real you could get indicted by a Grand Jury if it were recorded and shown to one.We're getting closer.

I am glad to see that you see the potential of this technology.Many people have mentioned having a GPU in one socket. This is indeed possible, but because of the required bandwidth problem, the use of a local cache would be necessary.Indeed, there is a design currently in progress which utilises several caches and predictive techniques to boost graphics performance.

It's so much more than what people are talking about on other posts. They find it quite easy to lose their way and fall into the AMD vs. Intel battle which is a useless wate of time.This is the stuff everyone has been waiting for and they don't even know it. I'm loving the idea of more realistic racing sims, and first person shooters that are so real you could get indicted by a Grand Jury if it were recorded and shown to one.We're getting closer.

I am glad to see that you see the potential of this technology.Many people have mentioned having a GPU in one socket. This is indeed possible, but because of the required bandwidth problem, the use of a local cache would be necessary.Indeed, there is a design currently in progress which utilises several caches and predictive techniques to boost graphics performance.