If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

The HD 5450 was passively cooled while the ASUS HD 4890 has a very large cooler.

Oh! Well, that explains that.

Hmm, the choice of passive vs active cooling is kind of annoying. Passive cooling is quieter, but then if it raises the temperature so that the CPU cooler has to work harder, then in the end it might be louder... Argh.

I think there are a couple of messages here, but none of them are "r600 is a better architecture for Mesa"

Are you sure about that? I remember hearing VLIW was tuned for graphics while SIMD (GCN) is there to make GPGPU tasks easier/faster. Now I don't want to spread misinformation, so please correct me if I'm wrong as I think you know it best.

Are you sure about that? I remember hearing VLIW was tuned for graphics while SIMD (GCN) is there to make GPGPU tasks easier/faster. Now I don't want to spread misinformation, so please correct me if I'm wrong as I think you know it best.

It's less that VLIW is more tuned for graphics, and more that it is unable to be tuned for anything else.

SIMD is better for GPGPU, but it can be equally good for graphics. The hardware is probably just more expensive to create that you could get away with if you were only targeting graphics through a VLIW architecture (and so therefore, it may end up being slower if amd just targets a specific price point).

Also, VLIW requires more complicated compiler optimizations to work correctly, so if anything SIMD should be the "better" option for Mesa.

SIMD is better for GPGPU, but it can be equally good for graphics. The hardware is probably just more expensive to create that you could get away with if you were only targeting graphics through a VLIW architecture (and so therefore, it may end up being slower if amd just targets a specific price point).

Right. A VLIW SIMD implementation requires less silicon area than a scalar SIMD implementation for the same graphics performance (since graphics workloads are mostly 3- and 4-vector operations anyways), but it's harder to make optimal use of VLIW hardware on arbitrary compute workloads.

On the other hand, for compute workloads which *do* fit well with VLIW hardware (basically ones which can be readily modified to make use of 4-vectors) the compute performance per unit area (and hence per-dollar) can be very high.

Right. A VLIW SIMD implementation requires less silicon area than a scalar SIMD implementation for the same graphics performance (since graphics workloads are mostly 3- and 4-vector operations anyways), but it's harder to make optimal use of VLIW hardware on arbitrary compute workloads.

On the other hand, for compute workloads which *do* fit well with VLIW hardware (basically ones which can be readily modified to make use of 4-vectors) the compute performance per unit area (and hence per-dollar) can be very high.

Can you explain the difference between scalar and vector simd? I know that a big difference between pre-gcn and gcn but I haven't found a resource that actually explains exactly what it means except that scalar SIMD is more flexible and can make good use of a hardware scheduler. That's also the very issue of SCALAR SIMD. The idea seems bizarre, as if it becomes an oxymoron with the multiple data part.

Can you explain the difference between scalar and vector simd? I know that a big difference between pre-gcn and gcn but I haven't found a resource that actually explains exactly what it means except that scalar SIMD is more flexible and can make good use of a hardware scheduler. That's also the very issue of SCALAR SIMD. The idea seems bizarre, as if it becomes an oxymoron with the multiple data part.

I don't know about all the exact terminology, but the major difference between the old VLIW and the new radeon SI architecture is explained somewhat by Anandtech.

Whereas VLIW is all about extracting instruction level parallelism (ILP), a non-VLIW SIMD is primarily about thread level parallelism (TLP).

Because the smallest unit of work is the SIMD and a CU has 4 SIMDs, a CU works on 4 different wavefronts at once. As wavefronts are still 64 operations wide, each cycle a SIMD will complete ¼ of the operations on their respective wavefront, and after 4 cycles the current instruction for the active wavefront is completed.

Cayman by comparison would attempt to execute multiple instructions from the same wavefront in parallel, rather than executing a single instruction from multiple wavefronts. This is where Cayman got bursty – if the instructions were in any way dependent, Cayman would have to let some of its ALUs go idle. GCN on the other hand does not face this issue, because each SIMD handles single instructions from different wavefronts they are in no way attempting to take advantage of ILP, and their performance will be very consistent.

Nice find. Thanks. This seems like it forces the hardware to be far more aware of program state than previous iterations. This would take some of the burden off the compiler writers, but it also appears to be more costly when it comes to silicon efficiency.