NVIDIA's Fermi GF100 Facts & Opinions

NVIDIA's "Fermi" next generation GF100 GPU is not here yet. Nope, we do not have hardware. But NVIDIA has given us an in-depth look at the specifics behind the architecture as it relates to gaming. NVIDIA certainly remembered us gamers and the fact that we like lots and lots of polygons.

Introduction

HardOCP recently had the opportunity to sit down with NVIDIA face-to-face and discuss its next generation GPU, codenamed "GF100" which is based on the "Fermi" architecture you have likely heard so much about for the last few months. Currently NVIDIA is not sharing GF100 based video card specifics. This is NOT a product launch! This is a look into the GF100 GPU’s inner workings and how that relates to gaming. The "GF" in "GF100" stands for a "Graphics" solution based on the "Fermi" architecture. The "100" denotes that it is the high-end part of the current GPU family.

The GF100 is NVIDIA’s next big investment, and it is yet to be seen if it will pay off for them. The GF100 is more than just a GPU for gaming; we all know that based on the recent information that has been given. However, don’t let this GP-GPU nonsense fool you, NVIDIA made it clear to us...finallyآ…the GF100 is built for gaming.

On this page is the official presentation in its entirety, and on the second page you will find the entire GF100 White Paper posted along with some opinions.

GF100 Architecture Deep Dive

NVIDIA’s focus for GF100 has been unclear to us at HardOCP. Is it a gamers’ GPU? Is it a GP-GPU? Is it both? Is it a GP-GPU disguised to be a gamers’ GPU or vice versa? After our conversations with NVIDIA last week, it seems NVIDIA has built a geometry power house, at least on paper. And finally NVIDIA talked about playing games on GF100 and NVIDIA went out of its way to make sure we understood that GF100 started out with gaming in mind and is ending with gaming in mind.

The GF100 should accelerate geometry faster than any other GPU known to date throughout the rendering pipeline, all the way from Triangle Setup to Geometry Shading to Tessellation to Rasterizing. If NVIDIA’s investment in its geometry engine proves correct, the GF100 could be substantially faster than the AMD Radeon HD 5000 series when it comes to things like DX11 Tessellation; one of the Radeon HD 5000 series main selling points right now. This is all theoretical of course until we actual test the GF100’s performance in games.

Some GF100 Specifications

GF100 will have 512 CUDA cores, which more than doubles its cores compared to the GeForce GTX 285 GPU’s 240 core. There are 64 texture units, compared to the GTX 285’s 80, but the Texture Units have been moved inside the Third Generation Streaming Multiprocessors (SM)for improved efficiency and clock speed. In fact, the Texture Units will run at a higher clock speed than the core GPU clock. There are 48 ROP units, up from 32 on the GTX 285. The GF100 will use 384-bit GDDR5, so depending on clock speeds it actually operates at, there is potential for high memory bandwidth. These changes seem logical, and encouraging, but without knowing clock speeds actual shader performance is anyone’s guess.

GF100 Processor Architecture

The big news is what NVIDIA has done in the GF100 to carry out their goal of removing geometry performance bottlenecks that should help speed up effects like Tessellation.

In a traditional pipeline setup for GPUs the Geometry Shader, Vertex Shader, Setup/Rasterizer functions would come at the front end of the pipeline. This creates a situation where data will be stored and read from memory on the video card. This is just how things have been done for the longest time, and NVIDIA believes the traditional setup creates a bottleneck in geometry performance.

Not so simply, what NVIDIA have done is to separate the Raster Engine from the pipeline and move it down into the GPCs in four parts, and they have created a new engine they are calling the "PolyMorph Engine" which is integrated into the SMs. First a little breakup of the hierarchy, the GF100 is made up of 4 GPCs (Graphics Processing Clusters) which break down into 4 SMs (Streaming Multiprocessors) which break down into 32 CUDA cores and 4 Texture Units and some other stuff. So, 32 CUDA cores plus 4 Texture Units plus the PolyMorph Engine make up an SM, and 4 SMs make up a GPC. With this kind of parallelism you can see how the GPU can be sliced and diced to create less expensive parts.

Inside each GPC you will find the actual Raster Engine, so there are basically 4 Raster Engines inside the GF100. Inside each SM (a culmination of 32 CUDA cores and 4 Texture Units) you will find the new PolyMorph Engine. The PolyMorph Engine contains the actual Vertex Fetch, Tessellator, Viewport Transform, Attribute Setup and Stream Output functions. Again, all of these functions, including the Rasterizer use to be in one area on the GPU sitting at the front end of the entire process in the pipeline.

What NVIDIA has done is made the GF100 more parallel than any GPU to date, obviously making it even less serialized. NVIDIA claims 8X the geometry performance of GT 200. This re-ordering of the graphics pipeline caused an increase of 10% of the die size and from our understanding of the issue, is the reason GF100 is "late." The problem with all this moving about of the pipeline is that now you have a Tessellator and Triangle Setup in each SM and GPC so all your Triangles that get setup are all setup out of order. All of the triangles now need to get setup and then put back together in the proper order at the end of the pipeline. There are caches to alleviate bottlenecks, and nothing in this process has to touch the local RAM on the video card. NVIDIA explained, while not in detail, that this out of order issue left many hurdles much higher than expected.

This re-design may pay off though if the slide above about the Unigine Benchmark is to be believed. NVIDIA is claiming much higher performance in this benchmark with Tessellation compared to the Radeon HD 5870, and we all know that benchmark was written specifically on Radeon HD 5000 series hardware. And while it is only a benchmark, the Unigine application is the best we have seen for leveraging DX11 tessellation showing off huge image quality impacts. If GF100 is beating the Radeon HD 5870 that much in a benchmark that was written for the Radeon HD 5870 in the first place, that just spells "awesome" for the kind of geometry performance potentially here.

GF100 Image Quality

NVIDIA also hasn’t forgotten about image quality and has improved CSAA IQ notably. NVIDIA will now offer a 32X CSAA mode, which uses 8x Color Samples and 24x Coverage Samples. Think of this as 8X MSAA on steroids, it should mean that there won’t be a huge drop in performance using 32X CSAA compared to 8X MSAAand this is certainly something we will test. On paper, 32X CSAA should be actually playable when 8X MSAA is in-game. There have also been improvements to Transparency AA (Alpha to Coverage.) NVIDIA also discussed that there shouldn’t be a large drop in performance using 8X MSAA compared to 4X MSAA as we saw with the GT200 thanks to performance improvements.

GF100 Compute For Gaming

There is a huge shift going on right now at NVIDIA in wanting to make the GF100 a General Compute Engine that is not just for gaming. These features can certainly help gaming when used to improve the experience by using the GPU to accelerate physical game effects and other APIs. There were also talks and demos shown using Ray Tracing, and while it isn’t 30fps in real-time, NVIDIA has demonstrated hybrid rendering modes utilizing Rasterization with Ray Tracing to allow real-time image manipulation and rendering. NVIDIA has plans to include a couple of neat applications with every GF100 video card that we think you guys are going to have some fun playing with.