NVIDIA today unveiled a new family of Tesla GPUs based on the revolutionary NVIDIA Kepler GPU computing architecture, which makes GPU-accelerated computing easier and more accessible for a broader range of high performance computing (HPC) scientific and technical applications.

The new NVIDIA Tesla K10 and K20 GPUs are computing accelerators built to handle the most complex HPC problems in the world. Designed with an intense focus on high performance and extreme power efficiency, Kepler is three times as efficient as its predecessor, the NVIDIA Fermi architecture, which itself established a new standard for parallel computing when introduced two years ago.

"Fermi was a major step forward in computing," said Bill Dally, chief scientist and senior vice president of research at NVIDIA. "It established GPU-accelerated computing in the top tier of high performance computing and attracted hundreds of thousands of developers to the GPU computing platform. Kepler will be equally disruptive, establishing GPUs broadly into technical computing, due to their ease of use, broad applicability and efficiency."

The Tesla K10 and K20 GPUs were introduced at the GPU Technology Conference (GTC), as part of a series of announcements from NVIDIA, all of which can be accessed in the GTC online press room.

NVIDIA developed a set of innovative architectural technologies that make the Kepler GPUs high performing and highly energy efficient, as well as more applicable to a wider set of developers and applications. Among the major innovations are:

SMX Streaming Multiprocessor -- The basic building block of every GPU, the SMX streaming multiprocessor was redesigned from the ground up for high performance and energy efficiency. It delivers up to three times more performance per watt than the Fermi streaming multiprocessor, making it possible to build a supercomputer that delivers one petaflop of computing performance in just 10 server racks. SMX's energy efficiency was achieved by increasing its number of CUDA architecture cores by four times, while reducing the clock speed of each core, power-gating parts of the GPU when idle and maximizing the GPU area devoted to parallel-processing cores instead of control logic.

Dynamic Parallelism -- This capability enables GPU threads to dynamically spawn new threads, allowing the GPU to adapt dynamically to the data. It greatly simplifies parallel programming, enabling GPU acceleration of a broader set of popular algorithms, such as adaptive mesh refinement, fast multipole methods and multigrid methods.

Hyper-Q -- This enables multiple CPU cores to simultaneously use the CUDA architecture cores on a single Kepler GPU. This dramatically increases GPU utilization, slashing CPU idle times and advancing programmability. Hyper-Q is ideal for cluster applications that use MPI.

"We designed Kepler with an eye towards three things: performance, efficiency and accessibility," said Jonah Alben, senior vice president of GPU Engineering and principal architect of Kepler at NVIDIA. "It represents an important milestone in GPU-accelerated computing and should foster the next wave of breakthroughs in computational research."

NVIDIA Tesla K10 and K20 GPUs
The NVIDIA Tesla K10 GPU delivers the world's highest throughput for signal, image and seismic processing applications. Optimized for customers in oil and gas exploration and the defense industry, a single Tesla K10 accelerator board features two GK104 Kepler GPUs that deliver an aggregate performance of 4.58 teraflops of peak single-precision floating point and 320 GB per second memory bandwidth.

The NVIDIA Tesla K20 GPU is the new flagship of the Tesla GPU product family, designed for the most computationally intensive HPC environments. Expected to be the world's highest-performance, most energy-efficient GPU, the Tesla K20 is planned to be available in the fourth quarter of 2012.

The Tesla K20 is based on the GK110 Kepler GPU. This GPU delivers three times more double precision compared to Fermi architecture-based Tesla products and it supports the Hyper-Q and dynamic parallelism capabilities. The GK110 GPU is expected to be incorporated into the new Titan supercomputer at the Oak Ridge National Laboratory in Tennessee and the Blue Waters system at the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign.

"In the two years since Fermi was launched, hybrid computing has become a widely adopted way to achieve higher performance for a number of critical HPC applications," said Earl C. Joseph, program vice president of High-Performance Computing at IDC. "Over the next two years, we expect that GPUs will be increasingly used to provide higher performance on many applications."

Preview of CUDA 5 Parallel Programming Platform
In addition to the Kepler architecture, NVIDIA today released a preview of the CUDA 5 parallel programming platform. Available to more than 20,000 members of NVIDIA's GPU Computing Registered Developer program, the platform will enable developers to begin exploring ways to take advantage of the new Kepler GPUs, including dynamic parallelism.

The CUDA 5 parallel programming model is planned to be widely available in the third quarter of 2012. Developers can get access to the preview release by signing up for the GPU Computing Registered Developer program on the CUDA website.

Hello, Big Kepler. 320 GB/s memory bandwidth, and for a HPC part suggests that GK110's memory interface isn't 384-bit, but 512-bit wide. If NVIDIA used 384-bit with today's 6.00 GHz memory, it would only achieve 288 GB/s.

Well, that's it folks. You know, NV has some business to attend. To be honest during the whole Kepler rumors before GK104 was launched I was thinking that NV has other priorities than discrete GPUs (Tesla, Tegra and such) and partially I am right. They have only one chip, the GK104, which we will see in 3 maybe even 4 variants and that's about all regarding Kepler for gaming. GK106 is nowhere to be seen, I'm starting to doubt that it exists somewhere and GK107 seems to be low end.

Well, that's it folks. You know, NV has some business to attend. To be honest during the whole Kepler rumors before GK104 was launched I was thinking that NV has other priorities than discrete GPUs (Tesla, Tegra and such) and partially I am right. They have only one chip, the GK104, which we will see in 3 maybe even 4 variants and that's about all regarding Kepler for gaming. GK106 is nowhere to be seen, I'm starting to doubt that it exists somewhere and GK107 seems to be low end.

Click to expand...

What? Everyone knows AMD's days are numbered because of the fabled vapor chip!

Well, that's it folks. You know, NV has some business to attend. To be honest during the whole Kepler rumors before GK104 was launched I was thinking that NV has other priorities than discrete GPUs (Tesla, Tegra and such) and partially I am right. They have only one chip, the GK104, which we will see in 3 maybe even 4 variants and that's about all regarding Kepler for gaming. GK106 is nowhere to be seen, I'm starting to doubt that it exists somewhere and GK107 seems to be low end.

Click to expand...

It's a shame. I think even AMD diehards were 'curious' about GK110 and now it's been revealed as a Tesla part. I'm a little underwhelmed. I'm happy with my current card but I wanted to see a 'Big Daddy' Kepler gaming part that wasn't a ridiculous and exceptional dual gpu.

"Kepler is world's first gpu designed for the cloud, to be deployed into cloud data centers worlwide. it does this with:
--virtualized gpu
--no longer does it need to connect to a display, it can render and stream instantaneously right out of chip to a remote location
--super energy efficiency, so it can be deployed in a massive scale

Every command buffer is now virtualized. we can now discern which virtual machine were to send us a graphics command. at the end, we can stream frame buffer to that spsecific virtual machine. One GPU can be shared with countless users.

"Kepler is world's first gpu designed for the cloud, to be deployed into cloud data centers worlwide. it does this with:
--virtualized gpu
--no longer does it need to connect to a display, it can render and stream instantaneously right out of chip to a remote location
--super energy efficiency, so it can be deployed in a massive scale

Every command buffer is now virtualized. we can now discern which virtual machine were to send us a graphics command. at the end, we can stream frame buffer to that spsecific virtual machine. One GPU can be shared with countless users.

yes indeed , but thats exactly why i feel so slapped about the face by nvidia with the GK104 i personally wanted a 660 with decent folding power , not to have to consider a 560, they seem to be essetially moveing towards a point where they will start selling only speciallised cards, gamer or folder, but not the two.

as this increases the profitability of their high end compute cards and closes the door on using cheaper Nv cards in compute intense applications and servers, its Gay

plus they are the biggest money milking tech gets ive seen, they squeeze a third more money out per coin spent then any other co(to be fair in some way a credit to them) but its from the customers

yes indeed , but thats exactly why i feel so slapped about the face by nvidia with the GK104 i personally wanted a 660 with decent folding power , not to have to consider a 560, they seem to be essetially moveing towards a point where they will start selling only speciallised cards, gamer or folder, but not the two.

as this increases the profitability of their high end compute cards and closes the door on using cheaper Nv cards in compute intense applications and servers, its Gay

plus they are the biggest money milking tech gets ive seen, they squeeze a third more money out per coin spent then any other co(to be fair in some way a credit to them) but its from the customers

yes indeed , but thats exactly why i feel so slapped about the face by nvidia with the GK104 i personally wanted a 660 with decent folding power , not to have to consider a 560, they seem to be essetially moveing towards a point where they will start selling only speciallised cards, gamer or folder, but not the two.

as this increases the profitability of their high end compute cards and closes the door on using cheaper Nv cards in compute intense applications and servers, its Gay

plus they are the biggest money milking tech gets ive seen, they squeeze a third more money out per coin spent then any other co(to be fair in some way a credit to them) but its from the customers

im only interested at all in Nv, for a hybrid physx card + perma folder card(1 off and probably a 560 now), as my next render card isnt out yet or for that matter even speculated about yet as my main rig is fine at this time(Fx8350 next Up).

I'm surprised even the Tesla GK104 is locked at 1/24th DP power. That's shameful if you ask me.

Click to expand...

thats why they need two of them on their first next gen compute card lmfao ,,you buying this im not and wasnt so to them i matter not but this dosnt scream performance crown to me and double its performance (GK110) and what do you get,,,thats right, it again but finally on 1 chip,,,, Epic Fail imho, though yes they will be economical, just shit.

I'm surprised even the Tesla GK104 is locked at 1/24th DP power. That's shameful if you ask me.

Click to expand...

It's not locked it - it's how it was designed. They reduced the number of advanced functional units in each SM in favor of more simpler ones to drive SP FP performance (and thus gaming) while reducing the power requirements.

GK104 consists of 4 blocks, but only one of the four can do DP FP calcs. From AT:

Anandtech said:

The other change coming from GF114 is the mysterious block #15, the CUDA FP64 block. In order to conserve die space while still offering FP64 capabilities on GF114, NVIDIA only made one of the three CUDA core blocks FP64 capable. In turn that block of CUDA cores could execute FP64 instructions at a rate of ¼ FP32 performance, which gave the SM a total FP64 throughput rate of 1/12th FP32. In GK104 none of the regular CUDA core blocks are FP64 capable; in its place we have what we’re calling the CUDA FP64 block.

The CUDA FP64 block contains 8 special CUDA cores that are not part of the general CUDA core count and are not in any of NVIDIA’s diagrams. These CUDA cores can only do and are only used for FP64 math. What's more, the CUDA FP64 block has a very special execution rate: 1/1 FP32. With only 8 CUDA cores in this block it takes NVIDIA 4 cycles to execute a whole warp, but each quarter of the warp is done at full speed as opposed to ½, ¼, or any other fractional speed that previous architectures have operated at. Altogether GK104’s FP64 performance is very low at only 1/24 FP32 (1/6 * ¼), but the mere existence of the CUDA FP64 block is quite interesting because it’s the very first time we’ve seen 1/1 FP32 execution speed. Big Kepler may not end up resembling GK104, but if it does then it may be an extremely potent FP64 processor if it’s built out of CUDA FP64 blocks.

It's not locked it - it's how it was designed. They reduced the number of advanced functional units in each SM in favor of more simpler ones to drive SP FP performance (and thus gaming) while reducing the power requirements.

GK104 consists of 4 blocks, but only one of the four can do DP FP calcs. From AT:

Click to expand...

Thanks. I went over and read a little on that page. The world makes sense now.