NVIDIA Fermi Features

In today's complex graphics, tessellation offers the means to store massive amounts of coarse geometry, with expand-on-demand functionality. In the NVIDIA GF100-series GPU, tessellation also enables more complex animations. In terms of model scalability, dynamic Level of Detail (LOD) allows for quality and performance trade-offs whenever it can deliver better picture quality over performance without penalty. Comprised of three layers (original geometry, tessellation geometry, and displacement map), the final product is far more detailed in shade and data-expansion than if it were constructed with bump-map technology. In plain terms, tessellation gives the peaks and valleys with shadow detail in-between, while previous-generation technology (bump-mapping) would give the illusion of detail.

Using GPU-based tessellation, a game developer can send a compact geometric representation of an object or character and the tessellation unit can produce the correct geometric complexity for the specific scene. Consider the "Imp" character illustrated above. On the far left we see the initial quad mesh used to model the general outline of the figure; this representation is quite compact even when compared to typical game assets. The two middle images of the character are created by finely tessellating the description at the left. The result is a very smooth appearance, free of any of the faceting that resulted from limited geometry. Unfortunately this character, while smooth, is no more detailed than the coarse mesh. The final image on the right was created by applying a displacement map to the smoothly tessellated third character to the left.

Tessellation in DirectX-11

Control hull shaders run DX11 pre-expansion routines, and operates explicitly in parallel across all points. Domain shaders run post-expansion operations on maps (u/v or x/y/z/w) and is also implicitly parallel. Fixed function tessellation is configured by Level of Detail (LOD) based on output from the control hull shader, and can also produce triangles and lines if requested. Tessellation is something that is new to NVIDIA GPUs, and was not part of GT200 because of geometry bandwidth bottlenecks from sequential rendering/execution semantics.

In regard to the GF110 graphics processor, NVIDIA has added a new PolyMorph and Raster engines to handle world-space processing (PolyMorph) and screen-space processing (Raster). There are sixteen PolyMorph engines and four Raster engines on the GF110, which depend on an improved L2 cache to keep buffered geometric data produced by the pipeline on-die.

GF100 Compute for Gaming

As developers continue to search for novel ways to improve their graphics engines, the GPU will need to excel at a diverse and growing set of graphics algorithms. Since these algorithms are executed via general compute APIs, a robust compute architecture is fundamental to a GPU's graphical capabilities. In essence, one can think of compute as the new programmable shader. GF110's compute architecture is designed to address a wider range of algorithms and to facilitate more pervasive use of the GPU for solving parallel problems. Many algorithms, such as ray tracing, physics, and AI, cannot exploit shared memory-program memory locality is only revealed at runtime. GF110's cache architecture was designed with these problems in mind. With up to 48 KB of L1 cache per Streaming Multiprocessor (SM) and a global L2 cache, threads that access the same memory locations at runtime automatically run faster, irrespective of the choice of algorithm.

NVIDIA Codename NEXUS brings CPU and GPU code development together in Microsoft Visual Studio 2008 for a shared process timeline. NEXUS also introduces the first hardware-based shader debugger. NVIDIA's GF100-series is the first GPU to ever offer full C++ support, the programming language of choice among game developers. To ease the transition to GPU programming, NVIDIA developed Nexus, a Microsoft Visual Studio programming environment for the GPU. Together with new hardware features that provide better debugging support, developers will be able enjoy CPU-class application development on the GPU. The end results is C++ and Visual Studio integration that brings HPC users into the same platform of development. NVIDIA offers several paths to deliver compute functionality on the GF110 GPU, such as CUDA C++ for video games.

Image processing, simulation, and hybrid rendering are three primary functions of GPU compute for gaming. Using NVIDIA's GF100-series GPU, interactive ray tracing becomes possible for the first time on a standard PC. Ray tracing performance on the NVIDIA GF100 is roughly 4x faster than it was on the GT200 GPU, according to NVIDIA tests. AI/path finding is a compute intensive process well suited for GPUs. The NVIDIA GF110 can handle AI obstacles approximately 3x better than on the GT200. Benefits from this improvement are faster collision avoidance and shortest path searches for higher-performance path finding.

Very compelling video card here. This looks to be a good replacement for the pair of GTX460's I was going to get. Maybe I'll get a pair of them,...(one for each box)...Ha-Ha! For the performance rendered, it really IS a good price point.

thnks guys for fast review.. its 3Am but i dont really sleep so readind reviews..

i didnt thought 152hp machine can beat 177hp.... but yea it matches it and so much price difference...totally stunning card compared to older price/perfor ratios by Nvidia...i was going for GTX 580 but now i gotta think 170$ extra for 6fps in heavy games...

Finally i gotta again wait for price downs on GTX 580 ... or i will go for GTX 570 which is sweet deal !!

The Bad Company benchmark is probably done a 1920x1080 resolution, but I just want to no for sure. Thanks

Lookin like a great card btw. First time in years that i'm looking to buy a nvidia card instead of an ati. I sent thermaltake a mail to see wether the awesome coolingapparatice ISGC-V320 will fit (believe me it's a great cooler, but has to be supported some way because of the enormous weight).

Take a look at some other reviews, and you'll see that these results are typical. In my recent testing, I'm getting 1-2 FPS higher than this, with a P55-based system. See my MSI Radeon HD 6870 article on this site for examples:

I think you're confusing network latency (lag) for video frame rate (FPS). Lag might appear to display choppy frames, but it's because of a network issue and not the video card. Most people avoid playing first person shooter games with a ping over 60ms.

The temperature difference still has me a little baffled. The cooling solution is the same, and the test methodology was identical. Yet, somehow, the GeForce GTX 570 sample I received heats up more than both the GTX 580 I have.

If I hadn't retested three different times, I would still think there was something amiss. It is what it is.

Temperatures can vary a bit between cards, even though they appear the same, its all down to the measuring stick, and a little bit to how good the contact between cooler and gpu/ram is. The temp. sensors can be off by a ways sometimes.

I use a Gigabyte gtx 580. First I tried it in a 2008 Mac Pro (running Windows 7) For some reason the cooling didn't work properly and the card went too hot and fail (above 100 degrees). Putting the card in a i7 system and it does work ok. Idle temps are around 48 degrees. Playing BFBC2 temp goes up to about 82. Running Adobe apps / premiere pro MPE temps are about 55 degrees.

I would like to see the 570's preformance against 470 with same GPU, shader and mem clock speed (and memory timings too). Then - I think - that the differences would be almost nothing, only that little which comes from 448 vs 480 CUDA cores.I searched for overclocking maximum results and not surprisingly the fastest 470 results clocked about 900+ (air, stock cooling) MHz - just the 570 results I found so far. With my lame mind (as I'm not an insider in the hw industry) I think that the ~900 is a limit that is nvidia made (with their 'protection'), not the limit that comes from the 40nm manufacturing process. And this things clearly show that the 570's higher clocks doesn't come from the more refined manufacturing (as older gen CPUs got more overclockable when their newer revisions came out) just from marketing strategy. Like you design a Porsche GT3, but first you sell a version to the public which is limited just a bit higher than the Boxter's (and call it Boxter2). Then playing that you worked hard (but did nothing), you limit the product less, say 911 Turbo. But literally the same product.I will wait for 1: when the prices starting to drop; 2: the next gen.

Sorry, 800+, not 900+ (the GPU overclock). Mem-wise: if the gddr5 can be clocked as high as 1200 on the 5870, then maybe NV have some surplus GPU power on their shelf to unleash (as the 512 bit wide membus too - remember GTX285). Not too surprising that mem can be pushed ~1150. My ex8800 GTS512's mem - which is GDDR3 - was stable on that clk, and if it's 65nm GPU was stable on 750 (750/1950/1150 @1,15V), then a two (55nm then the current 40nm) gen later 40nm chip's 600-800 MHz clk is somewhat funny.