Rumor: NVIDIA GeForce 800-Series Is 28nm in Oct/Nov.

Many of our readers were hoping to drop one (or more) Maxwell-based GPUs in their system for use with their 4K monitors, 3D, or whatever else they need performance for. That has not happened, nor do we even know, for sure, when it will. The latest rumors claim that the NVIDIA GeForce GTX 870 and 880 desktop GPUs will arrive in October or November. More interesting, it is expected to be based on GM204 at the current, 28nm process.

The recent GPU roadmap, as of GTC 2014

NVIDIA has not commented on the delay, at least that I know of, but we can tell something is up from their significantly different roadmap. We can also make a fairly confident guess, by paying attention to the industry as a whole. TSMC has been struggling to keep up with 28nm production, having increased wait times by six extra weeks in May, according to Digitimes, and whatever 20nm capacity they had was reportedly gobbled up by Apple until just recently. At around the same time, NVIDIA inserted Pascal between Maxwell and Volta with 3D memory, NVLink, and some unified memory architecture (which I don't believe they yet elaborated on).

And, if this rumor is true, Maxwell was pushed from 20nm to a wholly 28nm architecture. It was originally supposed to be host of unified virtual memory, not Pascal. If I had to make a safe guess, I would assume that NVIDIA needed to redesign their chip to 28nm and, especially with the extra delays at TSMC, cannot get the volume they need until Autumn.

Lastly, going by the launch of the 750ti, Maxwell will basically be a cleaned-up Kepler architecture. Its compute units were shifted into power-of-two partitions, reducing die area for scheduling logic (and so forth). NVIDIA has been known to stash a few features into each generation, sometimes revealing them well after retail availability, so that is not to say that Maxwell will be "a more efficient Kepler".

I'd say 20% max. Knowing what we know about both existing Maxwell products and 28nmm the chip is going to be big, hot, power hungry if they go for the performance aspects of the architecture. Maxwell is going to offer either 30% performance increase or 30% less power consumption. Most likely they're going to mix it up just to keep in the 250W envelope.

40-50% was what we figured would be the case on 20nm, and it appears that it'd be close. Shame Maxwell has become a 5 year wait for a dud.

AMD loves Nvidia's pricing, it keeps the AMD products going out the door, but AMD, and Nvidia, both need some competition in the discrete GPU market, and maybe a third player can be financed to take their Mobile SOC GPU product, and make a low end, at first, discrete GPU, maybe a low end discrete GPU with hardware ray tracing. Considering Nvidia's merging of desktop and mobile GPU microarchitectures it should be just a matter of scaling up the number of execution units in the GPU, and if Nvidia has gone this route, then is should be possible for some of the mobile SOC GPU makers to do the same scaling of their mobile SOC products and produce some discrete GPU competition.

Some real innovation in the GPU market, could come from AMD, should AMD decide to take their Console gaming APUs and put them on a discrete PCI form factor, a gaming APU on a PCI platform running its own gaming OS, and having the GPU resources of a high end discrete GPU. The new systems under development with stacked on Die/module RAM, and 1024 bit busses etched out on silicon substrates, connecting the on Module memory, APUs/SOCs, and memory, lots of very fast wide memory! Low latency by having CPU and GPU share the same DIE/Module, and separated by only a few millimeters, is what allowed the Gaming APUs to even be able to game adequately, that and lots of GDDR5 memory, and wide data busses, on die RAM, and intelligent memory controllers.

With memory stacking, large on die RAMs connected by internal on die busses, will offer L4 cashes the size of earlier system's complete memory, allowing Most of the gaming OS kernel's code, and gaming engine's most time/latency dependent functions to reside in, on die RAM, all without a single encode to, or decode from, any PCIe, or other protocol, except for some bursts of frame buffer access to and from GDDR5 frame buffer memory. Discrete gaming APUs, added plug and play style, to the available PCI slots, and able to run all the games CPU/GPU code, without the latency issues inherent in gaming with the motherboard CPU's narrow bus, and slow memory. AMD did not do too bad with the gaming APUs, considering the price constraints the OEM's put AMD under to win the contracts in the first place, AMD could take discrete gaming, high end discrete APU Gaming, and make a console on a card, that performs like a high end discrete GPU, only better.

"AMD loves Nvidia's pricing, it keeps the AMD products going out the door,"

/me hardware numbers from steam (which is a good ref)

Nvidia's market share has remained pretty steady around 51-52% and AMD has back in jan 2013 34%, as of june 2014 they were down to 30.4%. Main buyer recent were miners which probably helped that number drop a bit. As good as the AMD gpu is, its a hot running chip and pretty power hungry. If 750ti is anything to go by maxwell could be an issue for AMD for next generation. 750ti competes pretty decently with 260x and 265 at half the TDP even less then half in case vs 265. AMD is surviving partly off console licensing, nvidia did get a small piece with physx license to sony/MS. Don't think this generation of consoles will last very long as they were using already out slower pc hardware vs last ones when they were top end chips.

Intel can not charge a premium in the mobile market, the mobile market does not need x86, and Intel can not play dirty in the mobile tablet/phone market, like it does in the PC/laptop market, and Intel's pricing in the PC/Laptop market, is going to have to come down, once there are Power/power8 CPUs licensed, and manufactured by many companies. ARM based netbooks are also, going to compete more with Intel's netbook SOCs, and Nvidia will have some Denver core based Netbook design wins with Chrome/Android products in the near future, as will AMD, once AMD brings a custom microarchitecture based ARMv8 ISA APU to market. Apples A7, replacement, could very well power a laptop, if the A7's performance is any indicator, with execution resources more like Intel's core i series SKU, than ARM reference designs. It will not be Arm Holdings' reference designs, that move ARM into the low end laptop market, but the custom microarchitecture designs, that are built to execute the ARMv8 ISA, and Apple, Nvidia, AMD, and others will be making/continue to make custom microarchitecture designs, that have the ability to run the ARMv8 instruction set architecture.

Nvidia does not have much of an advantage over AMD in the GPU market, Nvidia has more people willing to pay more for their product, but AMD competes. Its not that simple, when you know Intel's market history, and the court interventions that have been enforced on Intel in the past, because of Intel's x86 unfair market practices towards AMD, and OEMs. OEM's are not very likely to adopt any Intel products across the OEM's entire device product lines, ever again, and why would they, when OEMs can call their own shots with an ARM license. Nvidia is just more upscale in its pricing, and AMD has always been more value for the dollar focused. Hopefully Nvidia will be producing some Power8 products in the future that can compete with Intel's x86 high performance SKU, Nvidia will be integrating its GPUs as accelerators for IBM's Power8 server/HPC systems, and AMD will also be able to license Power/Power8 through the OpenPower foundation and IBM,! Google is also looking at Power8 for its server farms, and Power8 is not PowerPC. The more affordable x86 will be offered by AMD, only the upscale, or those that need all the power, will buy Intel, that is until Power8 gets into the supply chain on PC/laptops.

If AMD can get a 12 x86 CPU core product to market, in both the PC and Laptop SKUs, it will get my business, as I need all the CPU cores that I can get for ray tracing/rendering. I only buy last year's Intel core i7s based laptops, on sale, and I never buy the latest, and pay the suckers premium, but AMD get a Laptop SKU with 12 cores, at a reasonable price, and I mean 12 full cores, no shared execution pipelines, or shared instruction decoders. I am also very open to AMD getting an custom ARMv8 ISA based APU with many CPU cores(16, or more) for rendering/ray tracing. The PowerVR wizard, by Imagination, my just break the need for more CPU cores, for ray tracing, with the Wizard GPU's hardware ray tracing abilities, I can only hope that AMD, and Nvidia will get hardware Ray tracing into their GPU/APU products, Intel has yet to announce any 6 core laptop SKUs AFAK.

Here's to hoping that Apple will also take a Power8 license, and make the next generation Mac Pro an All Apple CPU workstation, with either AMD's or Nvidia's GPUs. Knowing Apple's need for complete control, it is a possibility that they could take their mobile SOC licensed GPU IP, and pimp it out into a discrete GPU with some simple scaling of execution units, they have the funds, and they own a percentage of that GPU company. Commodity pricing for high end CPUs/SOCs, it's going to be great!

"it’s hard to argue that NVIDIA comes clean ahead. Things change if we’re talking about mining efficiency, but on every other front, NVIDIA’s package is better: The 780 Ti performs better overall, runs cooler and quieter, and boasts a couple of features that I consider to be awesome (ShadowPlay, for starters). At the moment, AMD’s best unique features are going to take some time to establish themselves, and even then, Mantle is likely to have limited use to someone running a high-end AMD card (because they likely also have a decent CPU)."

The 780Ti is faster in most cases while consuming less power and making less noise. Better overall performance and perf/W, hence the premium price and brand power.

I own a 290X, but only because it was $390 (used). If money were no object, I would get a 780Ti because I can at least acknowledge that NVIDIA offers a technically better product with more features that I want... but I'm too cheap.

Actually, the roadmaps aren't all that different. They just appear to be because the scales are logarithmic vs geometric. In both, Fermi is clearly at 2 GFLOPS/Watt, and Maxwell is 12. Tesla and Kepler are slightly ambiguous, but very close. Volta was in a space with a huge range, so it's not easy to compare to Pascal.

The performance just isn't going to scale much until they can move to a smaller process node. Graphics is obviously very parallel, so without moving to a smaller node they can't really throw that much more hardware at it. A more optimized design can get a little more performance, but it isn't the same as having a lot more hardware which a smaller process node makes available.

It would be nice to get the nvlink sooner rather than later, but I suspect that they are waiting for 20nm and/or stacked DRAM to implement this. They could implement nvlink with all of the chips soldered to the board (dual or quad gpu card), but this would be a very high-end, low-volume product; it probably isn't worth it right now unfortunately.

It will be a while before they develop a new form factor for nvlink. The edge connector used for pci-express will not be able to carry sufficient bandwidth. PCI-E 4.0 will be 31.51 GB/s for x16. To actually share memory between multiple gpus, you need speed on the same order of magnitude as the memory bandwidth of a gpu, which is around 300 GB/s for current high end gpus. With similar technology as Pci-e, this will probably require thousands of pins, which isn't impossible. We already have a cpu socket with 2011 pins to drive quad channel memory. We have needed a new form factor for a while, but this may not be in the best interest for all involved companies yet.

NvLink (CAPI is IBM's tech.), will be a high speed interface. This CAPI (Coherent Accelerator. Processor Interface) derived interface(Nvlink) will be traced out on a module, and comprise a mezzanine Module which will hold the GPU the stacked memory and other components, it will not be a PCI edge connector, the board will have a mezzanine module connector to accept the mezzanine module. An Nvlink block contains 8 lanes at 20 gigabits per second each lane, for 20 gigabytes per second per block total bandwidth, and the ability to have multiple blocks so 120 lanes(15 blocks) will give 300 GB. This article from Anand, covers the Pascal GPU, but mentions Nvidia's work with IBM on integrating Nvidia's GPUs with IBM's Power8, with a mezzanine module.

They shouldn't contain HDMI 2.0 period. If you need it for a monitor or some other purpose it should be as simple as an adapter. AMD/Nvidia claimed 5 years ago (back around the 200 Series) that by now they wanted DP to take over as the lead display technology on Video Cards, yet they've made little effort to do so. AMD's made the bigger leap. I still don't understand why Nvidia does dual DVI connections. It's a waste of space and both DVI and HDMI are going to hold monitors back at this point moving into higher resolutions.

With DP 1.3 already in the works and DP 1.2 already beating HDMI 2.0 3-4 years before its release, I have to ask why we are still using this old school plugs and a connection (HDMI) which was never meant for monitors in the first place. Manufacturers have drew their lines along time ago with HDMI being for TV's and DP the technology for Monitors. Lets make it happen already. Nvidia needs to get off their ass and start flexing their muscle in this area. AMD is putting them to shame and they don't even have the market.