The 50+ simple two-way in-order Pentium (yes, 1995 Pentium!) like cores feed the same number of 512-bit wide SIMD FP units, with the ability to deliver around 1 TFLOPs peak in double precision at around 1 GHz.

And, at least behind the closed doors, both AMD and Nvidia GPUs have been shown booting Linux on their own, without requiring a CPU.

umm. wat

AMD's Fusion is a product of years of research. AMD "demo'ed" an all HyperTransport Radeon about a year after they bought ATI, and they've also been showing off prototype Fusions that don't just have Radeon pipes on-die* but usable from the x86 interface side, although what "usable" means is still up in the air, but if they've managed to use them as the backend for SIMD instructions (ie, no more dedicated FPU units, and the x86 instruction scheduler issues as many ops as it can in parallel (instead of just, say, 2 per core), instead 512 Radeon ALUs across the entire CPU) this could mean a huge goddamned increase in FP performance without needing a dedicated HAL API like OpenCL.

* On-die Fusion Radeons don't have a Radeon memory controller and natively speak HyperTransport. The up side is, they have direct access to system memory as a native processor and can access stuff directly out of on-die cache: this means you have basically zero wait time to send stuff to the GPU for processing and you have zero cost cache coherency.

And, at least behind the closed doors, both AMD and Nvidia GPUs have been shown booting Linux on their own, without requiring a CPU.

umm. wat

AMD's Fusion is a product of years of research. AMD "demo'ed" an all HyperTransport Radeon about a year after they bought ATI, and they've also been showing off prototype Fusions that don't just have Radeon pipes on-die* but usable from the x86 interface side, although what "usable" means is still up in the air, but if they've managed to use them as the backend for SIMD instructions (ie, no more dedicated FPU units, and the x86 instruction scheduler issues as many ops as it can in parallel (instead of just, say, 2 per core), instead 512 Radeon ALUs across the entire CPU) this could mean a huge goddamned increase in FP performance without needing a dedicated HAL API like OpenCL.

* On-die Fusion Radeons don't have a Radeon memory controller and natively speak HyperTransport. The up side is, they have direct access to system memory as a native processor and can access stuff directly out of on-die cache: this means you have basically zero wait time to send stuff to the GPU for processing and you have zero cost cache coherency.

This is an old slide, but it gives a good vision of AMD's overall goal. We are somewhere between step 2 and step 3, and it's only going to be getting better! AMD has one of the most creative and innovative visions for the future of consumer computing (as opposed to intel just shrinking nm die sizes), and I think it's progressing quite well (just look at the success of their APU sales) I also think it's only going to get better for them as they move along with even more amazing features like what you just described.

And, at least behind the closed doors, both AMD and Nvidia GPUs have been shown booting Linux on their own, without requiring a CPU.

umm. wat

AMD's Fusion is a product of years of research. AMD "demo'ed" an all HyperTransport Radeon about a year after they bought ATI, and they've also been showing off prototype Fusions that don't just have Radeon pipes on-die* but usable from the x86 interface side, although what "usable" means is still up in the air, but if they've managed to use them as the backend for SIMD instructions (ie, no more dedicated FPU units, and the x86 instruction scheduler issues as many ops as it can in parallel (instead of just, say, 2 per core), instead 512 Radeon ALUs across the entire CPU) this could mean a huge goddamned increase in FP performance without needing a dedicated HAL API like OpenCL.

* On-die Fusion Radeons don't have a Radeon memory controller and natively speak HyperTransport. The up side is, they have direct access to system memory as a native processor and can access stuff directly out of on-die cache: this means you have basically zero wait time to send stuff to the GPU for processing and you have zero cost cache coherency.

This is an old slide, but it gives a good vision of AMD's overall goal. We are somewhere between step 2 and step 3, and it's only going to be getting better! AMD has one of the most creative and innovative visions for the future of consumer computing (as opposed to intel just shrinking nm die sizes), and I think it's progressing quite well (just look at the success of their APU sales) I also think it's only going to get better for them as they move along with even more amazing features like what you just described.

/amdfanboyrant

Yeah, what I described is clearly Step 3 or later. Intel also seems to have finally sold a "step 3" type of device in the Phi, depending on what it actually can do.

Yeah, what I described is clearly Step 3 or later. Intel also seems to have finally sold a "step 3" type of device in the Phi, depending on what it actually can do.

Intel seems more interested in incorporating the CPU into the GPU, while AMD is incorporating the GPU into the CPU. Totally different mindsets/endgames/results.

They both want branch/loop happy highly parallel computation. The Radeon's biggest "problem" (and I'm using the term loosely) is that wavefronts are ran in lockstep: both sides of a branch are the same length, even if it requires inserting no-ops, and loops that have lengths that are set at runtime (instead of static/compile time set) are just as nasty.

CPUs, otoh, can't do highly parallel calculations because of all the hardware dedicated dealing with branching, branch prediction, cache prediction, etc etc etc takes up a lot of room, produces a lot of heat, and uses a lot of power. I wonder how much stuff Intel removed to put 50 cores on a card.

So, how does it stand performance wise? Its double precision FP throughput is the same as the typical AMD Radeon HD7970 card which costs one quarter of the amount but with much smaller memory, 3 GB, and no ECC.

So, how does it stand performance wise? Its double precision FP throughput is the same as the typical AMD Radeon HD7970 card which costs one quarter of the amount but with much smaller memory, 3 GB, and no ECC.

No ECC? The 7970 has ECC

No it doesn't. What GCN did was add ECC to all internal on-die memory (caches, local stores, etc), but the only cards AMD has that have ECC GDDR5 are FirePro/FireStream cards, and although they're normal GCN chips, they're not referred to as such.

Finally on the memory side, AMD is adding proper ECC support to supplement their existing EDC (Error Detection & Correction) functionality, which is used to ensure the integrity of memory transmissions across the GDDR5 memory bus. Both the SRAM and VRAM memory can be ECC protected. For the SRAM this is a free operation, while for the VRAM there will be a performance overhead. We’re assuming that AMD will be using a virtual ECC scheme like NVIDIA, where ECC data is distributed across VRAM rather than using extra memory chips/controllers.

Shamino has done some LN2 overclocking when the 7970 was released, in his forum he wrote for the 7970,

Quote

actually 1800 ram is easy, i ran 2000 ram and it got the ECC correction and the score was worse.