Nvidia details GF100 graphics beastie

Minus the price - and the speed

Nvidia has released additional details on its upcoming GF100 graphics processor, and if the GPU performs as well in reality as it does on paper, AMD/ATI's Radeon HD 5000 series may have a worthy competitor.

The GF100 will be Nvidia's first to be based on the company's muscular Fermi architecture, which features such niceties as scores of CUDA (compute unified device architecture) cores and ECC (error-correcting code) support. Fermi will find its way into a variety of products destined for both desktops and HPC rigs. The GF100 will be the first game-centric part.

According to Nvidia, the GF100 is "designed for gaming performance leadership." To help accomplish this goal, the GF100 implements all of Windows 7's DirectX 11 hardware APIs. Nvidia is especially proud of the GF100's support for DirectX 11's tessellation capabilities, which it asserts will allow for more-complex geometry and animation, including enhanced fluid effects and more-realistic hair effects.

In contrast to Nvidia's earlier GT200 architecture, the GF100 takes a more-distributed approach to tessellation. This improved distribution and parallelization results in a 8X improvement in tessellation performance than the GT200, according to the company's internal benchmarks.

Also supported will be DirectX 11's DirectCompute APIs, which developers can use to offload such highly parallelized tasks such as media processing from a system's CPU to the GF100.

Although GF100 technology will find eventually find its way into less-ambitious parts, the full-bore spec released this Sunday includes 512 CUDA cores arrayed in four graphics processing clusters (GPCs), each of which contain four streaming multiprocessors (SMs).

Each of those wee green squares is a processing core - there are 512 of them

Each SM contains 32 CUDA processors, four times more than the company's previous SM designs. Each CUDA processor has both an arithmetic logic unit (ALU) and a floating point unit (FPU). The FPUs are based on the IEEE 754-2008 floating-point standard using the fused multiply-add (FMA) instruction, which Nvidia claims provides improved precision over the older multiply-add (MAD) instruction, minimizing rendering errors in closely overlapping triangles.

Four GPCs each have four SMs communicating with a with a unified raster engine

Each SM also includes four special function units (SFUs), which Nvidia says are used for such functions as sine, cosine, reciprocal, square root, and graphics interpolation. All the SFUs' math mojo, according to Nvidia, is especially helpful for complex procedural shaders.

Each SM has 32 CUDA cores - that's 4X the cores of its previous generation

Also inside those 16 SMs is what Nvidia call its PolyMorph Engine, which includes, among other items, the GF100's tesselators. Placing a tesselator in each SM allows the bandwidth of the tessellation to be greatly increased - which accounts for much of that aforementioned 8X bump over the tesselation performance of the GT200.

Each SM also has its own 64KB of L1 cache, plus the GF100 as a whole has 768KB of fully coherent, read/write L2 cache - a step up from the GT200, where the 256KB L2 was read-only for the texture engine. According to Nvidia, this improved cache architecture will not only help texture coverage, but will also boost the GF100's compute performance.

Word on the street is that the GF100 will be available in late March. Unfortunately, Nvidia has remained silent about how much the part will cost and how much power it will consume - meaning how much of a power-supply and cooling-system upgrade you may be facing. Even the part's clock rate remains under wraps.

For more detail on the GF100, check out HardOCP's excellent "Deep Dive," or download Nvidia's own white papers detailing the GF100 and the Fermi compute architecture. ®