NVIDIA GeForce GTX 670 Launch Review - PAGE 2

Like Fermi, Kepler GPUs are comprised of different configurations of Graphics Processing Clusters (GPCs), Streaming Multiprocessors (SMs), and memory controllers. The GeForce GTX 670 uses the same GK-104 GPU as the GTX 680, but again the difference is it has one fewer SMX cluster. This means the GTX 670 still has 4 GPCs, but uses 7 next-generation Streaming Multiprocessors (SMX) units instead of eight. It still has the same amount memory controllers however, 4 in total.

Starting at the top of the GK-104 block, Kepler has a single GigaThread Engine which fetches the specified data from system memory and copies them to the frame-buffer. The Engine then creates and dispatches the threads from the memory to the GPCs, where it delivered to the execution units. Following the GigaThread Engine are a total of four Graphics Processing Clusters (GPCs), which is where the majority of operations are performed. This is due to each GPC having a dedicated raster engine, as well as resources for shading, texturing and computation.

The memory sub-system of the Kepler architecture has also been redesigned to support higher clock speeds. This overhaul of the memory interface allowed NVIDIA to push the operating frequency of the memory up to 6008MHz (4002MHz effective). The memory sub-system of the GTX 670 is the same as the GTX 680, down to the frequencies, so it has a 2GB frame buffer that runs on a 256-bit wide GDDR5 interface, which equates to a total bandwidth rating of 192.2GB/s. Additionally, the GK-104 GPU has 4 memory controllers, along with 512KB L2 cache, and since each GPC has its own Raster Unit there are a total of 32 Raster Operation Units.

Inside each GPC are two SMX units which have been optimized to offer the best performance-per-watt by running the shaders at the same frequency as the GPU clock, and not double it. This approach gives Kepler twice the performance-per-watt of the Fermi architecture while allowing more CUDA cores to be packed into a single SMX unit. Inside each SMX are 192 CUDA cores which equates to a total of 1344 CUDA cores, triple the amount in the GTX 570. Of course since the CUDA core clock is equal to the GPU clock, the performance per CUDA core is reduced from the previous generation but the 1:1 clock design allows the GTX 680 and GTX 670 to achieve the same throughput all while staying within a lower power envelope.

Looking at the functions of the execution units, the CUDA cores are designed to perform the pixel, vertex and geometry shading, as well as the physics compute calculations. The texture units on the other hand perform texture filtering, load/store units and fetch and save data to memory. Meanwhile, Special Function Units (SFUs) handle transcendental and graphics interpolation instructions. Finally, the PolyMorph Engine handles vertex fetch, tessellation, viewport transform, attribute setup, and stream output.

The new Boost Clock feature is one of the biggest changes to the Kepler family. In essence, the Boost Clock works along the same lines as Intel's Turbo Boost, which dynamically adjusts the clock speeds in real-time, thus increasing the performance. However, Boost Clock is different in the sense that the maximum Boost Clock frequencies are not necessarily where the GPU clock will cap during gaming. Instead, Boost Clock works at both a hardware and software level to dynamically boost the GPU clock speed and under most circumstances, will increase the GPU clock speed well above the actual Boost Rating. Of course not all silicon is the same, so each Kepler board will have its own unique Boost Clock speed.

The typical board power defined for the GTX 670 is only 170W. This means that the Boost Clock will increase the clock speeds to fit into this power envelope under load. Additionally, GPU Boost operates completely autonomously so there are no game profiles and no intervention required by the end user, providing an instant performance boost to gamers. The technology also works on a microsecond level, and does constant checks of the GPU voltage and conditions to see if the clocks can go higher or if they need to be throttled down to the base 3D clock. In addition, the GTX 670 has a maximum thermal threshold of up to 98°C.

Comments

leochan - I see a nice jump up in performance from my 570, so maybe when I make a new computer eventually I'll get 770's XD Or maybe I'll go with 780's instead....Perhaps by the time I upgrade, the dualGPU version of the 780 will be out and not $1000...I am not planning to make a new rig for some time, since it's so expensive for me (I'd basically upgrade everything, so I just make a new rig instead).

But yeah, $400 is actually a pretty decent price considering its power. It's only going to come down in price too.

I did just notice one error when skimming around the article:

quote Conclusion Page

The Kepler architecture has already exceeded all our exceptionsexpectations here at Neoseeker

Well, since you're in mistake-correcting mod, I will correct yours: exception is a word, so spell check wouldn't flag it.

Well no kidding? XD

Hence why I said "Spell check gone awry". As in, making the wrong correction. The two words share a lot of similarities. Depending on the error made, one or the other word would be the first choice to correct it with.