Over the past six months, we heard different bits'n'pieces of information when it comes to GT300, nVidia's next-gen part. We decided to stay silent until we have information confirmed from multiple sources, and now we feel more confident to disclose what is cooking in Santa Clara, India, China and other nV sites around the world.

GT300 isn't the architecture that was envisioned by nVidia's Chief Architect, former Stanford professor Bill Dally, but this architecture will give you a pretty good idea why Bill told Intel to take a hike when the larger chip giant from Santa Clara offered him a job on the Larrabee project.

Thanks to Hardware-Infos, we managed to complete the puzzle what nVidia plans to bring to market in couple of months from now.
What is GT300?

Even though it shares the same first two letters with GT200 architecture [GeForce Tesla], GT300 is the first truly new architecture since SIMD [Single-Instruction Multiple Data] units first appeared in graphical processors.

GT300 architecture groups processing cores in sets of 32 - up from 24 in GT200 architecture. But the difference between the two is that GT300 parts ways with the SIMD architecture that dominate the GPU architecture of today. GT300 Cores rely on MIMD-similar functions [Multiple-Instruction Multiple Data] - all the units work in MPMD mode, executing simple and complex shader and computing operations on-the-go. We're not exactly sure should we continue to use the word "shader processor" or "shader core" as these units are now almost on equal terms as FPUs inside latest AMD and Intel CPUs.

GT300 itself packs 16 groups with 32 cores - yes, we're talking about 512 cores for the high-end part. This number itself raises the computing power of GT300 by more than 2x when compared to the GT200 core. Before the chip tapes-out, there is no way anybody can predict working clocks, but if the clocks remain the same as on GT200, we would have over double the amount of computing power.
If for instance, nVidia gets a 2 GHz clock for the 512 MIMD cores, we are talking about no less than 3TFLOPS with Single-Precision. Dual precision is highly-dependant on how efficient the MIMD-like units will be, but you can count on 6-15x improvement over GT200.

This is not the only change - cluster organization is no longer static. The Scratch Cache is much more granular and allows for larger interactivity between the cores inside the cluster. GPGPU e.g. GPU Computing applications should really benefit from this architectural choice. When it comes to gaming, the question is obviously - how good can GT300 be? Please do bear in mind that this 32-core cluster will be used in next-generation Tegra, Tesla, GeForce and Quadro cards.

This architectural change should result in dramatic increase in Dual-Precision performance, and if GT300 packs enough registers - performance of both Single-Precision and Dual-Precision data might surprise all the players in the industry. Given the timeline when nVidia begun work on GT300, it looks to us like GT200 architecture was a test for real things coming in 2009.

Just like the CPU, GT300 gives direct hardware access [HAL] for CUDA 3.0, DirectX 11, OpenGL 3.1 and OpenCL. You can also do direct programming on the GPU, but we're not exactly sure would development of such a solution that be financially feasible. But the point in question is that now you can do it. It looks like Tim Sweeney's prophecy is slowly, but certainly - coming to life.