Two months ago I wrote about the FirePro S9000 – AMD’s answer to the K10 – and was already looking forward to this K20. Where in the gaming world, it hardly matters what card you buy to get good gaming performance, in the compute-world it does. AMD presented 3.230 TFLOPS with 6GB of memory, and now we are going to see what the K20 can do.

The K20 is very different from its predecessor, the K10. Biggest difference is the difference between the number of double precision capability and this card being more powerful using a single GPU. To keep power-usage low, it is clocked at only 705MHz compared to 1GHz of the K10. I could not find information of the memory-bandwidth.

ECC is turned on by default, hence you see it presented having 5GB. No information yet if this card also has ECC on the local memories/caches.

Conclusion

The most important improvement is found in the double precision performance: from 0.095 TFLOPS of the K10 to a whopping 1.170 TFLOPS. The S9000 cannot compete here.

The memory bandwidth is not known yet.

Tesla K20 is a clear winner in these categories:

GFLOPS/Watt (single precision)

Single precision performance with a single GPU

Double precision performance with a single GPU

The only disadvantage is that the price has increased a lot, which is because of the high price NVIDIA has put on their double precision cores. As silicon is silicon, I am not sure why they do this – marketing-wise a smart move.

Where AMD always had the advantage in double precision, NVIDIA kicks in hard and makes it very hard for AMD to come up with an answer to this GPU. Know that a dual-GPU of the K20 is logically coming up next, and the 705MHz clock gives space to overclock it with 40% – if you want to take the risk. This rises the bar for both Intel and AMD to come up with a faster accelerator under 225Watt for the rest of 2013. I am looking forward to see their answers.

Best&worse – NVIDIA vs AMD

Even though the cards are very comparable on most specifications, they are a winner and loser in one of them.

NVIDIA just released the official specs, the memory bandwidth of the Tesla K20 is 208 GB/s. Next to the Tesla K20, there is also a Tesla K20X, with a memory bandwidth of 250 GB/s, 1.31 TFLOPS DP and 3.95 TFLOPS SP.
Source: http://www.nvidia.com/object/tesla-servers.html

Angel Genchev

The companies have put ECC memory for increased safety, thus it`s not wise to overclock, because you loose the safety hence the reason you spent $2.4K. If you want non safe, overclocked GPU, then you can just get a Radeon HD7970 Ghz edition for less than $450 which has similar performance, but higher power consumption (non-GPGPU power optimized)

streamcomputing

Agree – I read people who did/suggested it, as there is relatively much overclock-potential on this device. It would be the last thing I personally would do.

Stay up to date via Twitter

Receive mail for each new post

WebCL detected?

Quotes on OpenCL

»We are working with studios to develop new tools and workflow enhancements, together with GPU and OpenCL optimisations. For Roland Emmerich's film Anonymous, we worked with Uncharted Territory to develop the 3D volume fog system. The budget and time constraints meant the visual effects could not have been done in any other way.«

Visitors

Get in touch

GPU Training

We know how to get your developers understanding to learn programming on GPUs - which is a totally different mindset than used with CPU-programming.
We use OpenCL to explain GPU-programming, but any language can be used.