NVIDIA will offer a 3GB version of the GTX 1060, and there's more to the story than the obvious fact that is has half the frame buffer of the 6GB version available now. It appears that this is an entirely different product, with 128 fewer CUDA cores (1152) than the 6GB version's 1280.

Boost clocks are the same at 1.7 GHz, and the 3GB version will still operate with a 120W TDP and require a 6-pin power connector. So why not simply name this product differently? It's always possible that this will be an OEM version of the GTX 1060, but in any case expect slightly lower performance than the existing version even if you don't run at high enough resolutions to require the larger 6GB frame buffer.

When The Tech Report first conducted their review of the RX 470 they saw benchmark behaviour very different from any other GPU in that family but could not figure out what it was and resolve it before the mob arrived with pitchforks and torches demanding they publish or die.

As it turns out there was indeed something rotten in benchmark; incredibly high DPC on the test machine. Investigation determined the culprit to be the beta BIOS on their ASRock Z170 Extreme7+, specifically the BIOS which allowed you to overclock locked Intel CPUs. They have just released their new findings along with a look at LatencyMon and DPC in general. Take a look at the new benchmarks and information about DPC, but also absorb the consequences of demanding articles arrive picoseconds after the NDA expires; if there is a delay in publishing there might just be a damn good reason why.

"We retested our RX 470 to account for this issue, and we also updated our review with DirectX 12 benchmarks for Rise of the Tomb Raider and Hitman, plus full OpenGL and Vulkan benchmarks for Doom."

With a low-profile PCB and pre-fitted, fully-sealed liquid cooler, the Hydro GFX GTX 1080 is simple and easy to install. Just fit the card into a PCI-E 3.0 x16 slot, mount the radiator and enjoy low maintenance liquid cooling for the lifetime of the card.”

Naturally, with an integrated closed-loop liquid cooler this GTX 1080 won't be relegated to stock speeds out of the box, though Corsair leaves this up to the user. The card offers three performance modes which allow users to choose between lower noise and higher performance. Silent Mode leaves the GTX 1080 at stock settings (1733 MHz Boost), Gaming Mode increases the Boost clock to 1822 MHz, and OC Mode increases this slightly to 1847 MHz (while increasing memory speed in this mode as well).

Ryan posted details about the Radeon RX 470 and 460 graphics cards at the end of last month, and both are now available. Now the largest of the board partners, ASUS, has added both of these new GPUs to their Republic of Gamers STRIX series.

The STRIX Gaming RX 470 (Image: ASUS)

ASUS announced the Radeon RX 470 STRIX Gaming cards last week, and today the more affordable RX 460 GPU variant has been announced. The RX 470 is certainly a capable gaming option as it's a slightly cut-down version of the RX 480 GPU, and with the two versions of the STRIX Gaming cards offering varying levels of overclocking, they can come even closer to the performance of a stock RX 480.

The STRIX Gaming RX 460 (Image: ASUS)

The new STRIX Gaming RX 460 is significantly slower, with just 896 stream processors (to the 2048 of the RX 470) and a 128-bit memory interface (compared to 256-bit). Part of the appeal of the reference RX 460 - aside from low cost - is low power draw, as the <75W power draw allows for slot-powered board designs. This STRIX Gaming version adds a 6-pin power connector, however, which should provide additional overhead for further overclocking.

Specifications:

STRIX-RX470-O4G-GAMING

STRIX-RX470-4G-GAMING

STRIX-RX460-O4G-GAMING

GPU

AMD Radeon RX 470

AMD Radeon RX 470

AMD Radeon RX 460

Stream Processors

2048

2048

896

Memory

4GB GDDR5

4GB GDDR5

4GB GDDR5

Memory Clock

6600 MHz

6600 MHz

7000 MHz

Memory Interface

256-bit

256-bit

128-bit

Core Clock

1270 MHz (OC Mode)
1250 MHz (Gaming Mode)

1226 MHz (OC Mode)
1206 MHz (Gaming Mode)

1256 MHz (OC Mode)
1236 MHz (Gaming Mode)

Video Output

DVI-D x2
HDMI 2.0
DisplayPort

DVI-D x2
HDMI 2.0
DisplayPort

DVI-D
HDMI 2.0
DisplayPort

Power Connection

6-pin

6-pin

6-pin

Dimensions

9.5" x 5.1" x 1.6"

9.5" x 5.1" x 1.6"

7.6" x 4.7" x 1.4"

The STRIX Gaming RX 470 OC 4GB is priced at $199, matching the (theoretical) retail of the 4GB RX 480, and the STRIX Gaming RX 470 is just behind at $189. The considerably lower-end STRIX Gaming RX 460 is $139. A check of Amazon/Newegg shows listings for these cards, but no in-stock units as of early this afternoon.

Alongside the release of the Radeon RX 460 and RX 470 graphics cards, AMD has released the Radeon Software Crimson Edition 16.8.1 drivers. Beyond adding support for these new products, it also adds a Crossfire profile for F1 2016 and fixes a few issues, like Firefox and Overwatch crashing under certain circumstances. It also allows users of the RX 480 to overclock their memory higher than they previously could.

AMD is continuing their trend of steadily releasing graphics drivers, and rapidly fixing important issues as they arise. Also, they have been verbose in their release notes, outlining fixes and known problems as they occur. Users can often track the bugs that affect them as they are added to the Known Issues, then graduated to Fixed Issues. While this often goes unrecognized, it's frustrating as a user to experience a bug and not know whether the company even knows about it, or they are just refusing to acknowledge it.

Useful release notes, like AMD has been publishing, are very helpful in that regard.

Raw Data is an early access game for the HTC Vive, one which requires space to move and which allows the Vive to show off its tracking ability. [H]ard|OCP wanted to see how the GPUs found in most high end systems would perform in this VR game and so grabbed several AMD and NVIDIA cards to test out. Benchmarking VR games is not an easy task, instead of raw performance you need to focus on the dropped frames and unstable fps which result in nausea and a less engrossing VR experience. To that end [H] has played the game numerous times on a variety of GPUs with settings changing throughout to determine the sweet spot for the GPU you are running. VR offers a new gaming experience and new tests need to be developed to demonstrate performance to those interested in jumping into the new market. Check out the full review to see what you think of their methodology as well as the raw performance of the cards.

"Both AMD and NVIDIA have had a lot to say about "VR" for a while now. VR is far from mainstream, but we are now seeing some games that are tremendously compelling to play, putting you in middle of the action. Raw Data is one of those, and it is extremely GPU intensive. How do the newest GPUs stack up in Raw Data?"

"Unlike the two previous AMD GPUs released under the Polaris branding recently, RX 460 is very much a mainstream part that's aimed at buyers who are taking their first real steps into PC gaming. RX 460 uses a distinct, smaller die and is to be priced from £99. As usual, let's fire up the comparison specification table and dissect the latest offering from AMD."

Following the official launch of AMD's Radeon RX 470 GPU, Sapphire has unleashed its own custom graphics card with the Nitro+ RX 470 in 4GB and 8GB factory overclocked versions. Surprisingly, the new cards are up for purchase now at various retailers at $210 for the 4GB model and $240 for the 8GB model (more on that in a bit).

The new Nitro+ RX 470 uses the same board and cooler design as the previously announced Nitro+ RX 480 which is a good thing both for Sapphire (less R&D cost) and for consumers as they get a rather beefy cooler that should allow them to push the RX 470 clocks quite a bit. The card uses the same Dual X cooler with two 95mm quick connect fans, three nickel plated copper heatpipes, and an aluminum fin stack. The card features the same black fan shroud and black and silver colored backplate. Out of the box this cooler should keep the RX 470 GPU running cooler and quieter than the RX 480, but it should also enable users to get higher clocks out of the smaller GPU (less cores means less heat and more overclocking headroom assuming you get a good chip from the silicon lottery).

Sapphire is using Black Diamond 4 chokes and a 4+1 power phase design that is driven by a single 8-pin PCI-E power connector (and up to 75W from the motherboard slot). This mirrors the design of its RX 480 sibling.

Display outputs include a single DVI, two HDMI 2.0b, and two DisplayPort 1.4 ports.

The chart below outlines the comparison between the Nitro+ RX 470 cards, RX 470 reference specifications, and the RX 480.

Nitro+ RX 470 4GB

Nitro+ RX 470 8GB

RX 470 Reference

RX 480

Stream Processors

2048

2048

2048

2304

Compute Units

32

32

32

36

TMUs

128

128

128

144

ROPs

32

32

32

32

GPU Clock (Base)

1143 MHz

1121 MHz

926 MHz

1120 MHz

GPU Clock (Boost)

1260 MHz

1260 MHz

1206 MHz

1266 MHz

Memory

4GB GDDR5 @ 7 GHz

8GB GDDR5 @ 8 GHz

4 or 8 GB GDDR5 @ 6.6 GHz

4 or 8 GB GDDR5 @ up to 8 GHz

Memory Bus

256-bit

256-bit

256-bit

256-bit

Memory Bandwidth

224 GB/s

256 GB/s

211 GB/s

256 GB/s

TDP

<225W

<225W

120W

150W

GPU

Polaris 10

Polaris 10

Polaris 10

Polaris 10

Price

$210

$240

$180+

$200+ ($240+ for 8GB)

The RX 470 GPU is only slightly cut down from RX 480 in that it features four fewer CUs though the processor maintains the same number of ROP units and the same 256-bit memory bus. Reference clocks are 926 MHz base and 1206 MHz boost. Memory can be up to 8GB of GDDR5 with reference memory clocks of 6.6 GHz (effective). Sapphire has overclocked both the GPU and memory with the NItro+ series. The Nitro+ RX 470 with 4GB of GDDR5 is clocked at 1143 MHz base, 1260 MHz boost, and 7 GHz memory while the 8GB version has a lower base clock of 1121 but a higher memory clock of 8 GHz.

The 8GB model having a lower base overclock is a bit strange to me, but at least they are rated at the same boost clock. These specifications are very close to the RX 480 actually and with a bit of user overclocking beyond the factory overclock you could get even closer to the performance of it.

The problem with this RX 470 that gets so close to the RX 480 though is that the price is also very close to reference RX 480s! The Sapphire Nitro+ RX 470 4GB is priced at $209.99 while the Nitro+ RX 470 8GB is $239.99.

These prices put the card well into RX 480 territory though not quite up to the MSRPs of factory overclocked RX 480s (e.g. Sapphire's own Nitro+ RX 480 is $219 and $269 for 4GB and 8GB respectively). The company has a nice looking (and hopefully performing) RX 470, but it is going to be tough to choose this card over a RX 480 that has more shaders and TMUs. One advantage though is that this is a card that will just work without having to manually overclock (though where is the fun in that? heh) and it is actually available right now unlike the slew of RX 480 cards that have been launched but are consistently out of stock everywhere! If you simply can't wait for a RX 480, this might not be a bad option.

EDIT: Of course the 8GB model goes out of stock at Newegg as I write this and Amazon's prices are higher than MSRP! hah.

A Beautiful Graphics Card

As a surprise to nearly everyone, on July 21st NVIDIA announced the existence of the new Titan X graphics cards, which are based on the brand new GP102 Pascal GPU. Though it shares a name, for some unexplained reason, with the Maxwell-based Titan X graphics card launched in March of 2015, this is card is a significant performance upgrade. Using the largest consumer-facing Pascal GPU to date (with only the GP100 used in the Tesla P100 exceeding it), the new Titan X is going to be a very expensive, and very fast gaming card.

As has been the case since the introduction of the Titan brand, NVIDIA claims that this card is for gamers that want the very best in graphics hardware as well as for developers and need an ultra-powerful GPGPU device. GP102 does not integrate improved FP64 / double precision compute cores, so we are basically looking at an upgraded and improved GP104 Pascal chip. That’s nothing to sneeze at, of course, and you can see in the specifications below that we expect (and can now show you) Titan X (Pascal) is a gaming monster.

Titan X (Pascal)

GTX 1080

GTX 980 Ti

TITAN X

GTX 980

R9 Fury X

R9 Fury

R9 Nano

R9 390X

GPU

GP102

GP104

GM200

GM200

GM204

Fiji XT

Fiji Pro

Fiji XT

Hawaii XT

GPU Cores

3584

2560

2816

3072

2048

4096

3584

4096

2816

Rated Clock

1417 MHz

1607 MHz

1000 MHz

1000 MHz

1126 MHz

1050 MHz

1000 MHz

up to 1000 MHz

1050 MHz

Texture Units

224

160

176

192

128

256

224

256

176

ROP Units

96

64

96

96

64

64

64

64

64

Memory

12GB

8GB

6GB

12GB

4GB

4GB

4GB

4GB

8GB

Memory Clock

10000 MHz

10000 MHz

7000 MHz

7000 MHz

7000 MHz

500 MHz

500 MHz

500 MHz

6000 MHz

Memory Interface

384-bit G5X

256-bit G5X

384-bit

384-bit

256-bit

4096-bit (HBM)

4096-bit (HBM)

4096-bit (HBM)

512-bit

Memory Bandwidth

480 GB/s

320 GB/s

336 GB/s

336 GB/s

224 GB/s

512 GB/s

512 GB/s

512 GB/s

320 GB/s

TDP

250 watts

180 watts

250 watts

250 watts

165 watts

275 watts

275 watts

175 watts

275 watts

Peak Compute

11.0 TFLOPS

8.2 TFLOPS

5.63 TFLOPS

6.14 TFLOPS

4.61 TFLOPS

8.60 TFLOPS

7.20 TFLOPS

8.19 TFLOPS

5.63 TFLOPS

Transistor Count

11.0B

7.2B

8.0B

8.0B

5.2B

8.9B

8.9B

8.9B

6.2B

Process Tech

16nm

16nm

28nm

28nm

28nm

28nm

28nm

28nm

28nm

MSRP (current)

$1,200

$599

$649

$999

$499

$649

$549

$499

$329

GP102 features 40% more CUDA cores than the GP104 at slightly lower clock speeds. The rated 11 TFLOPS of single precision compute of the new Titan X is 34% higher than that of the GeForce GTX 1080 and I would expect gaming performance to scale in line with that difference.

Titan X (Pascal) does not utilize the full GP102 GPU; the recently announced Pascal P6000 does, however, which gives it a CUDA core count of 3,840 (256 more than Titan X).

A full GP102 GPU

The complete GPU effectively loses 7% of its compute capability with the new Titan X, although that is likely to help increase available clock headroom and yield.

The new Titan X will feature 12GB of GDDR5X memory, not HBM as the GP100 chip has, so this is clearly a unique chip with a new memory interface. NVIDIA claims it has 480 GB/s of bandwidth on a 384-bit memory controller interface running at the same 10 Gbps as the GTX 1080.

Realworldtech with Compelling Evidence

Yesterday David Kanter of Realworldtech posted a pretty fascinating article and video that explored the two latest NVIDIA architectures and how they have branched away from the traditional immediate mode rasterization units. It has revealed through testing that with Maxwell and Pascal NVIDIA has gone to a tiling method with rasterization. This is a somewhat significant departure for the company considering they have utilized the same basic immediate mode rasterization model since the 90s.

The Videologic Apocolypse 3Dx based on the PowerVR PCX2.

(photo courtesy of Wikipedia)

Tiling is an interesting subject and we can harken back to the PowerVR days to see where it was first implemented. There are many advantages to tiling and deferred rendering when it comes to overall efficiency in power and memory bandwidth. These first TBDR (Tile Based Deferred Renderers) offered great performance per clock and could utilize slower memory as compared to other offerings of the day (namely Voodoo Graphics). There were some significant drawbacks to the technology. Essentially a lot of work had to be done by the CPU and driver in scene setup and geometry sorting. On fast CPU systems the PowerVR boards could provide very good performance, but it suffered on lower end parts as compared to the competition. This is a very simple explanation of what is going on, but the long and short of it is that TBDR did not take over the world due to limitations in its initial implementations. Traditional immediate mode rasters would improve in efficiency and performance with aggressive Z checks and other optimizations that borrow from the TBDR playbook.

Tiling is also present in a lot of mobile parts. Imagination’s PowerVR graphics technologies have been implemented by others such as Intel, Apple, Mediatek, and others. Qualcomm (Adreno) and ARM (Mali) both implement tiler technologies to improve power consumption and performance while increasing bandwidth efficiency. Perhaps most interestingly we can remember back to the Gigapixel days with the GP-1 chip that implemented a tiling method that seemed to work very well without the CPU hit and driver overhead that had plagued the PowerVR chips up to that point. 3dfx bought Gigapixel for some $150 million at the time. That company then went on to file bankruptcy a year later and their IP was acquired by NVIDIA.

Screenshot of the program used to uncover the tiling behavior of the rasterizer.

It now appears as though NVIDIA has evolved their raster units to embrace tiling. This is not a full TBDR implementation, but rather an immediate mode tiler that will still break up the scene in tiles but does not implement deferred rendering. This change should improve bandwidth efficiency when it comes to rasterization, but it does not affect the rest of the graphics pipeline by forcing it to be deferred (tessellation, geometry setup and shaders, etc. are not impacted). NVIDIA has not done a deep dive on this change for editors, so we do not know the exact implementation and what advantages we can expect. We can look at the evidence we have and speculate where those advantages exist.

The video where David Kanter explains his findings

Bandwidth and Power

Tilers have typically taken the tiled regions and buffered them on the chip. This is a big improvement in both performance and power efficiency as the raster data does not have to be cached and written out to the frame buffer and then swapped back. This makes quite a bit of sense considering the overall lack of big jumps in memory technologies over the past five years. We have had GDDR-5 since 2007/2008. The speeds have increased over time, but the basic technology is still much the same. We have seen HBM introduced with AMD’s Fury series, but large scale production of HBM 2 is still to come. Samsung has released small amounts of HBM 2 to the market, but not nearly enough to handle the needs of a mass produced card. GDDR-5X is an extension of GDDR-5 that does offer more bandwidth, but it is still not a next generation memory technology like HBM 2.

By utilizing a tiler NVIDIA is able to lower memory bandwidth needs for the rasterization stage. Considering that both Maxwell and Pascal architectures are based on GDDR-5 and 5x technologies, it makes sense to save as much bandwidth as possible where they can. This is again probably one, among many, of the reasons that we saw a much larger L2 cache in Maxwell vs. Kepler (2048 KB vs. 256KB respectively). Every little bit helps when we are looking at hard, real world bandwidth limits for a modern GPU.

The area of power efficiency has also come up in discussion when going to a tiler. Tilers have traditionally been more power efficient as well due to how the raster data is tiled and cached, requiring fewer reads and writes to main memory. The first impulse is to say, “Hey, this is the reason why NVIDIA’s Maxwell was so much more power efficient than Kepler and AMD’s latest parts!” Sadly, this is not exactly true. The tiler is more power efficient, but it is a small part to the power savings on a GPU.

The second fastest Pascal based card...

A modern GPU is very complex. There are some 7.2 billion transistors on the latest Pascal GP-104 that powers the GTX 1080. The vast majority of those transistors are implemented in the shader units of the chip. While the raster units are very important, they are but a fraction of that transistor budget. The rest is taken up by power regulation, PCI-E controllers, and memory controllers. In the big scheme of things the raster portion is going to be dwarfed in power consumption by the shader units. This does not mean that they are not important though. Going back to the hated car analogy, one does not achieve weight savings by focusing on one aspect alone. It is going over every single part of the car and shaving ounces here and there, and in the end achieving significant savings by addressing every single piece of a complex product.

This does appear to be the long and short of it. This is one piece of a very complex ASIC that improves upon memory bandwidth utilization and power efficiency. It is not the whole story, but it is an important part. I find it interesting that NVIDIA did not disclose this change to editors with the introduction of Maxwell and Pascal, but if it is transparent to users and developers alike then there is no need. There is a lot of “secret sauce” that goes into each architecture, and this is merely one aspect. The one question that I do have is how much of the technology is based upon the Gigapixel IP that 3dfx bought at such a premium? I believe that particular tiler was an immediate mode renderer as well due to it not having as many driver and overhead issues that PowerVR exhibited back in the day. Obviously it would not be a copy/paste of the technology that was developed back in the 90s, it would be interesting to see if it was the basis for this current implementation.