PNY GTX 960 XLR8 Review

PNY GTX 960 XLR8 Introduction:

The PNY name has been around for some time now as a purveyor of many different products, including NVIDIA-based graphics cards. Both reference and custom designs are in PNY's product stack, and today I will be looking at one of those custom designs for the mid-range market: the GTX 960 XLR8. As a non-reference design that uses a smaller PCB and a more efficient cooling solution, you would be surprised to see that this card carries an impressive factory overclock of 1304MHz on the core that translates into a very nice 1367MHz Turbo Boost 2.0 clock speed. Those speeds rank as one of the highest, if not the highest, clock speed on the market. Pretty impressive for an entry in this price point.

Speaking of price, this factory overclocked and custom cooled card is currently available for $229 from your normal channels. Having looked at a pair of GTX 960 cards on launch day, I feel this offering from PNY will compare favorably. Let's dig in and see if it does.

PNY GTX 960 XLR8 Closer Look:

I had my first look at the Maxwell architecture when I looked at the GM107-based GTX 750 Ti and GTX 750 earlier last year. The architecture was built to maximize energy efficiency and still deliver excellent gaming performance. Now, we have the full realization of the architecture, labeled as GM204, that takes this to a new level by delivering twice the performance per watt when compared to the early versions of Kepler architecture, such as that seen in the GTX 660. NVIDIA has used roughly the same architectural arrangement since Fermi debuted in 2010. From Fermi to Kepler and now Maxwell, we see huge improvements.

Looking at the Maxwell GM206 architecture, it is based around two graphics processing clusters, each with its own raster engine. Each GPC has a total of four Maxwell Streaming Multiprocessors units, each with a Polymorph engine; 128 CUDA cores; and eight texture units. A pair of 64-bit memory controllers are used to manage 2GB of 7000MHz rated GDDR5. Built on NVIDIA's 28nm process, this implementation houses only 2.94 billion transistors compared to 5.4 billion under the lid on the GTX 980. Being built for efficiency and performance, you can see how less yet more efficient hardware should relate to the much improved 120 watt power envelope.

The reference core clock speeds for the GTX 960 are 1126MHz with a boost clock of 1178MHz. However, the PNY GTX 960 XLR8 comes to us with a healthy overclock right out of the box. Both the reference card and PNY's overclocked version share a base memory speed on the 2GB of GDDR5 at 1750MHz, or an effective rate of 7000MHz running through a 128-bit bus. While this may initially seem to be cause for concern, the smaller bus width proved a non-issue with earlier Maxwell-based cards, delivering higher memory bandwidth through the narrower bus. NVIDIA has some additional tech up its sleeves for improving the memory compression techniques to reduce the memory bandwidth needs. By using the new third generation lossless Delta Color compression algorithms, you see a benefit as data is written to and from the GDDR5 memory at up to an 8:1 ratio depending on the size of the pixel block being written. This results in Maxwell needing 25% less bytes of data than a comparable Kepler core. A Kepler core would need a memory data rate of 9.3Gbps to run comparable throughput numbers to Maxwell's memory architecture.

When the GTX 980 launched, NVIDIA improved the feature set on its Maxwell-based video cards with added technologies implemented on the new releases as well as with the cards it was putting out, going forward with DX12 support, MFAA, DSR, and VXGI. I'll let NVIDIA do the talking here so the details do not get lost in translation, enjoy!

DirectX 12: "Spanning devices ranging from PCs to consoles, tablets, and smartphones, Microsoft’s upcoming DirectX 12 API has been designed to have CPU efficiency significantly greater than earlier DirectX versions. One of the keys to accomplishing this is providing more explicit control over hardware—giving game developers more control of GPU and CPU functions. While the NVIDIA driver very efficiently manages resource allocation and synchronization under DX11, under DX12 it is the game developer’s responsibility to manage the GPU and GPU. Because the developer has an intimate understanding of their application’s behavior and needs, DX12 has the potential to be much more efficient than DX11 at the cost of some effort on the part of the developer. DX12 contains a number of improvements that can be used to improve the API’s CPU efficiency; we’ve announced that all Fermi, Kepler, and Maxwell GPUs will fully support the DX12 API."

"In addition, the DX12 release of DirectX will introduce a number of new features for graphics rendering. Microsoft has disclosed some of these features, at GDC and during NVIDIA’s Editor’s conference. Conservative Raster, discussed earlier in the GI section of this paper, is one such DX graphics feature. Another is Raster Ordered Views (ROVs,) which gives developers control over the ordering pixel shader operations. GM2xx supports both Conservative Raster and ROVs. The new graphics features included in DX12 will be accessible from either DX11 or DX12 so developers will be free to use these new features with either the DX11 or DX12 APIs on GPUs that implement the features in hardware."

Multi Frame Sampled AA or MFAA: "Game developers and GPU vendors are increasingly implementing more advanced forms of anti-aliasing (AA) to enhance image quality. GM2xx GPUs have a number of new features for much more flexible sampling, enabling further advancements in AA quality and efficiency. Today’s GPUs ship with fixed sample patterns for AA that are stored in ROM. When gamers select 2x or 4xMSAA for example, the pre-stored sample patterns are used. While many current games implement deferred, post-processed AA techniques such as FXAA or SMAA, there are still many others that continue to use traditional hardware-based multi-sample AA (MSAA). GM2xx GPUs support multi-pixel programmable sampling for rasterization, providing opportunities for more flexible and novel AA techniques to be implemented in the context of both deferred and conventional forward rendering."

"With programmable sample positions, the ROMs that were used to store the standard sample positions are replaced with RAMs. The RAMs may be programmed with the standard patterns, but the driver or application may also load the RAMs with custom positions which are free to vary from frame to frame or even within a frame. In a 16x16 grid per pixel, we have 256 different locations to choose from for each sample. We’ve also extended this programmable sample location support to span multiple pixels, so for example in 4x MSAA rendering, all 16 samples within a 2x2 pixel footprint can be located arbitrarily. This sample randomization can greatly reduce the quantization artifacts (like stair-stepping) that occur with more traditional forms of AA. These freely specified sampling positions may be used in the development of effective new algorithms."

"NVIDIA engineers have recently developed new AA algorithms that vary, in interleaved fashion, the sample patterns used per pixel either spatially in a single frame (where, for example, each successive pixel uses one of four different 4xAA sample patterns) or interleaved across multiple frames in time. Multi-Frame Sampled AA (MFAA) is a new AA technique that alternates AA sample patterns both temporally and spatially to produce the best image quality while still offering a performance advantage compared to traditional MSAA. The final result can deliver image quality approaching that of 8xAA at roughly the cost of 4xAA, or 4xAA quality at roughly the cost of 2xAA. Below we have a few images that show 4XMSAA implemented in BF4 and on the right the same scene with MFAA enabled."

DSR or Dynamic Super Resolution: "Thanks to rapidly falling LCD prices, the popularity of 4K displays has surged this year. But not all gamers want to spend the money on a new monitor. For the eye candy purists who don’t want to splurge but want to approximate the crisper visuals of a 4K panel, many have turned to the process of “downsampling.” This is where the GPU renders the game at a resolution higher than the screen can display, and then scales the image down to the native resolution on output to the user’s display. Downsampling require users to set up custom displays with the graphics driver control panel, and adjust various low-level display parameters to appear properly. So it’s not necessarily the friendliest way to improve image quality. Also, while downsampling can provide a significant improvement in image quality, artifacts are sometimes observed on textures and when certain post-processing effects are applied."

"To address the usability and quality issues, NVIDIA has developed a method called Dynamic Super Resolution. In principal, Dynamic Super Resolution works like traditional downsampling, but it has a simple on/off user control, and it uses a 13-tap Gaussian filter during the conversion to display resolution. The high-quality filter reduces or eliminates the aliasing artifacts experienced with the simple downsampling, which relies on a simpler box filter.Note that people often confuse downsampling (and now Dynamic Super Resolution) with the traditional Supersampling method of anti-aliasing. All three techniques render at higher resolutions internally, and then filter down to lower resolution for output. The difference is that downsampling and Dynamic Super Resolution actually have the game render at the higher resolution, so the game believes it’s running on a higher resolution display, and the GPU then filters and samples down. The process should work with most games well, aside from some issues with visibility of onscreen game controls being displayed on lower resolution monitors."

"With supersampling, the game still renders at a particular resolution—say 1920x1080—and the GPU upsamples that resolution without the game’s knowledge, and then filters back down. This can cause issues with newer games that use post-processing effects, or are expecting the full rendering process to be at a given resolution set by the game itself. Dynamic Super Resolution can be found in the control panel of our Release 343 driver, as well as GeForce Experience, where we provide Optimal Playable Settings (OPS) for Dynamic Super Resolution for today’s hottest games. While it’s compatible with all GeForce GPUs, the best performance can be seen when using a GeForce GTX 980."

"Going forward we could potentially use Maxwell’s more advanced sampling control features, like programmable sample positions and interleaved sampling, to further improve Dynamic Super Resolution for owners of GM2xx GPUs." In the image below you can get a feel for what this technology can do for image quality when you compare the grass on the right and left sides of the image."

Last but not least is Voxel Global Illumination or VXGI: "Correct modelling of lighting is the most computationally difficult problem in graphics, and with Maxwell, our objective was to make an enormous leap forward in the capability of the GPU to perform near-photo-real lighting calculations in real time. In the real world, all objects are lit by a combination of direct light (photons that travel directly from a light source to illuminate an object) and indirect light (photons that travel from the light source, hit one object and bounce off of it and then hit a second object, thus indirectly illuminating that object). “Global illumination” (GI) is a term for lighting systems that model this effect. Without indirect lighting, scenes can look harsh and artificial. However, while light received directly is fairly simple to compute, indirect lighting computations are highly complex and computationally heavy. Because it’s a computationally expensive lighting technique (particularly in highly detailed scenes), GI has been primarily used to render complex CG scenes in movies using offline GPU rendering farms."

"While some forms of GI have been used in many of today’s most popular games, their implementations have relied on pre-computed lighting. These “prebaked” techniques are used for performance reasons; however, they require additional artwork, as the desired lighting effects must be computed beforehand. Because prebaked lighting is not dynamic, it’s often difficult or impossible to update the indirect light sources when in-game changes occur; say for instance an additional light source is added or something in the scene moves or is destroyed. Prebaked indirect lighting models the static objects of the scene, but doesn’t properly apply to the animated characters or moving objects.In 2011, NVIDIA engineers developed and demonstrated an innovative new approach to computing a fast, approximate form of global illumination dynamically in real time on the GPU. This new GI technology uses a voxel grid to store scene and lighting information, and a novel voxel cone tracing process to gather indirect lighting from the voxel grid. NVIDIA’s Cyril Crassin describes the technique in his paper on the topic and a video from GTC 2012 is available here. Epic’s ‘Elemental’ Unreal Engine 4 tech demo from 2012 used a similar technique."

"Since that time, NVIDIA has been working on the next generation of this technology—VXGI—that combines new software algorithms and special hardware acceleration in the Maxwell architecture. To understand how voxel global illumination works, it is helpful to first understand voxels. The term “voxel” is related to “pixel.” Whereas a pixel represents a 2D point in space, a voxel represents a small cube (a volume) of 3D space. To perform global illumination, we need to understand the light emitting from all of the objects in the scene, not just the direct lights. To accomplish this, we dice the entire 3D space of the scene in all three dimensions, into small cubes called voxels. “Voxelization” is the process of determining the content of the scene at every voxel, analogous to “rasterization” which is the process of determining the value of a scene at a given 2D coordinate."

PNY's packaging for the GTX 960 XLR8 features the NVIDIA balck and lime green concept with an image of the card on the left hand side, the model to the right, and some feature set highlights including that the GTX 960 XLR8 is a factory overclocked video card; uses a dual fan cooling solution; and has a lifetime warranty. The back side has an opening in the middle of the panel that shows the dual fan cooling solution of the card. To the far right are the NVIDIA specific features, and a chart that highlights the base and GPU Boot 2.0 clock speeds to show just how far above the reference clocks the GTX 960 XLR8 operates.

Inside is a matte blck box with the PNY logo on it. The card is packaged in a soft foam enclosure to prevent any damage to the card. This soft foam is more than adequate to protect the card. Buried under the foam, a cardboard shell separates the accessory bundle from the card. The bundle includes a mini-HDMI to HDMI dongle, a DVI to VGA adapater, dual 4-pin molex to 6-pin PCIe power adapter, driver disc, and quick install guide. The power adapter is a nice touch, since most newer power supplies come with at least one 6-pin PCIe power connection. This helps the user who is upgrading from an earlier card and may just be stepping into a more robust card.

Having seen what the GTX 960 is capable of, let's see what PNY brings to the table with its factory overclocked version of this mid-range staple with the GTX 960 XLR8.