Turing vs Volta: Two Chips Enter. No One Dies.

In the past, when NVIDIA launched a new GPU architecture, they would make a few designs for each of their market segments. All SKUs would be one of those chips, with varying amounts of it disabled or re-clocked to hit multiple price points. The mainstream enthusiast (GTX -70/-80) chip of each generation is typically 300mm2, and the high-end enthusiast (Titan / -80 Ti) chip is often around 600mm2.

Kepler used quite a bit of that die space for FP64 calculations, but that did not happen with consumer versions of Pascal. Instead, GP100 supported 1:2:4 FP64:FP32:FP16 performance ratios. This is great for the compute community, such as scientific researchers, but games are focused on FP32. Shortly thereafter, NVIDIA releases GP102, which had the same number of FP32 cores (3840) as GP100 but with much-reduced 64-bit performance… and much reduced die area. GP100 was 610mm2, but GP102 was just 471mm2.

At this point, I’m thinking that NVIDIA is pulling scientific computing chips away from the common user to increase the value of their Tesla parts. There was no reason to either make a cheap 6XXmm2 card available to the public, and a 471mm2 part could take the performance crown, so why not reap extra dies from your wafer (and be able to clock them higher because of better binning)?

And then Volta came out. And it was massive (815mm2).

At this point, you really cannot manufacture a larger integrated circuit. You are at the limit of what TSMC (and other fabs) can focus onto your silicon. Again, it’s a 1:2:4 FP64:FP32:FP16 ratio. Again, there is no consumer version in sight. Again, it looked as if NVIDIA was going to fragment their market and leave consumers behind.

And then Turing was announced. Apparently, NVIDIA still plans on making big chips for consumers… just not with 64-bit performance. The big draw of this 754mm2 chip is its dedicated hardware for raytracing. We knew this technology was coming, and we knew that the next generation would have technology to make this useful. I figured that meant consumer-Volta, and NVIDIA had somehow found a way to use Tensor cores to cast rays. Apparently not… but, don’t worry, Turing has Tensor cores too… they’re just for machine-learning gaming applications. Those are above and beyond the raytracing ASICs, and the CUDA cores, and the ROPs, and the texture units, and so forth.

But, raytracing hype aside, let’s think about the product stack:

NVIDIA now has two ~800mm2-ish chips… and

They serve two completely different markets.

In fact, I cannot see either FP64 or raytracing going anywhere any time soon. As such, it’s my assumption that NVIDIA will maintain two different architectures of GPUs going forward. The only way that I can see this changing is if they figure out a multi-die solution, because neither design can get any bigger. And even then, what workload would it even perform? (Moment of silence for 10km x 10km video game maps.)

What do you think? Will NVIDIA keep two architectures going forward? If not, how will they serve all of their customers?

"I figured that meant consumer-Volta, and NVIDIA had somehow found a way to use Tensor cores to cast rays. Apparently not… but, don’t worry, Turing has Tensor cores too… they’re just for machine-learning gaming applications."

And here you were wrong but also wrong about the Tensor Cores on Turing being only for machine learning, Scott, as it is the Tensor Cores on Turing and their TRAINED AI that is there to denoise the RT cores limited Ray Generation output!

It's the trained AI Denoising done on the Tensor Cores that allow for Nvidia's Hybrid Ray Tracing to be used. There is only a limited milliseconds time slot per frame going from say 30-60 FPS and above to perform that convolutional AI accelerated denoising process on Turing's Tensor Cores because those RT cores output is too noisy to be of use.

The AI training is not done on Turing's Tensor cores Turing's tensor cores there to run the already trained AI denoising algorithm. Nvidia did the training on its Tesla based Clusters and Nvidia refined that Denoising AI to perform the Denoising in millisecond time frames that high FPS gaming require.

10, or 8, or even 6 gigarays/second has to be divided done by the number of milliseconds of time allotted per frame so that a lot less available ray per 30-60 and above frame time available. How many rays per frame well that's going to vary from 30-60 and above with the higher frame rates having less milliseconds available per frame for the limited Amount of ray tracing compute on those New "Ray Tracing" cores.

Ray Tracing is a compute oriented workload but the information on those ray tracing cores is currently limited, as in no whitepapers currently. But how much different are Nvidia's Ray Tracing cores than say AMD GCN compute cores, where ray tracing calculations cane also be done. Everybody is going to have to wait for the whitepapers because Nvidia is not providing the information currently. Hell Nvidia has yet to discuss the Turing variants ROP and TMU counts also.

The Tensor Cores are not for Ray Tracing that's the job of the "Ray Tracing(RT)" cores and how different are Nvidia's On GPU RT cores from Imagination Technologies' PowerVR Wizard Ray Tracing cores. The Tensor Cores are NOT for Ray Tracing they are for Denoising Turing's RT core's limited and very grainy Ray Tracing output that has to be done very quickly in the allotted milliseconds time frame for each frame buffer.

Most device Tensor Cores on consumer devices are running the pre-trained trained AIs and are not there to do any training. That's AI training is done on the big GPU clusters over thousands and thousands of hours in order to winnow down the best performing Denoising result(In Turing's case) for a rapid enough processing performance to use for gaming denoising workloads. Without those tensor cores denoising AI running on Turing's tensor cores that Ray tracing core output would be total garbage to be of use for gaming.

Not sure if they will keep two architectures, I just think that most enthusiast will continue to focus on the amount of cuda cores on a chip to guage true real world performance and ignore rest of the stuff until it matters are few gpu generations from now. Amd still control the hardware in the console space and that is where most of the money is made in the gaming space, so devs will continue to use current techniques program games. More cuda cores usually means better performance, looking forward to the 4k review numbers to see how much more performance we get between 3584 cuda cores and 4352.

The Ray tracing focus is meant to establish market dominance. Nobody will care about traditional performance aka rasterization anymore because it simply can not produce the same quality as real time raytracing.

We will see more and more hyper specialization of GPU because that's where future performance and quality gains will be.

Nothing produces the quality of nature like real Photons and atoms can and computer CPU/GPU Ray Tracing is a compute intensive simulation of Nature's Photons/Atoms interactions(that very much is relate to eletcrons on the atoms and the electrons pulsing through any processor's cores).

And you better damn well care about compute because actual real time Ray Tracing will take Petaflops per second of compute power!

Nvidia's "Ray Tracing" output is rather limited and has to be mixed in with the raster output. AND it's that Trained AI based denoising done on Turing's Tensor Cores that's fixing up that poor quality Ray Tracing output from Turing's RT cores and that's what makes the ray tracing worthwhile for high FPS gaming.

Example:

frames per second into milliseconds for 60-FPS:

Take the reciprocal. (1/60fps)*1000=16.67 ms. to render each frame.

How Many Rays can a those RT cores compute in 16.67 ms?

How grainy with that rather limited RT core output be?

How fast must those Turing Tensor cores's AI trained algorithm denoise that grainy RT cores output in order to make that 16.67 ms time constraint at 60 FPS?

I'm pretty sure that the Tensor Core AI must get its job done in much less than 16.67 ms in order for the output to have time to be mixed down with the Raster Pipline Processing stages at that example 60 FPS frame rate.

That's one whole buttload of compute to perform at under 16.67 ms. So compute is of the essence for GPUs as well as CPUs!

Where did Nvidia train that AI in the first place, and it sure the hell was not on Truing's limited amount of Tensor Cores.

Nvidia Trained that AI on its Volta/Tesla based supercomputing cluster over some damn many thousands of hours you can be damn sure!

So that trained AI is then loaded and run on Turing's tensor cores to properly denoise all that crappy RT cores' ray tracing output, and it had damn well be PDQ about it, because things at 60 FPS could just as well be 120 FPS, or 140 FPS. And in real gaming, and computing in general, things get very Monte Carlo based mathmatically doing those best case worst case amortizations on all that stochastic action.

So those Tensor Cores will have some variable amount of time that is always going to have to be less than the milliseconds frame time at any nominal frame per second rate for gaming to get its job done. PDQ is very apropos to any dicussion regarding high FPS gaming workloads!
.
.
.

"
Problem #2:

A 60-watt light bulb emits 60 joules/sec of energy ( 1 Joule/sec == 1 watt). Pretend for a moment that all of this energy is emitted in the form of photons with wavelength of 600 nm (this isn't true, of course -- as a blackbody, a light bulb emits photons of a wide range in wavelengths). Calculate how many photons per second are emitted by such a light bulb.

Solution: This problem asks you to calculate how many photons per second are emitted from a 60-watt lightbulb. Since 60 watts is 60 Joules per second, we know that we need enough photons to carry 60 Joules of energy each second.

Sure it's like Gameworks and AMD has the same sort of partnerships but AMD lacks the funds to get as wide a level of IP adoption of AMD's IP as Nvidia can with its billions.

Look at Intel and laptop design wins and look at those Mini-Desktop PCs like The ZOTAC ZBOX MA551 that was supposed to come with Zen/Vega Raven Ridge options. WTF is the ZOTAC ZBOX MA551 with AMD's Raven Ridge inside.

And there, gaming Folks, is the truth of the matter as far as the OEM PC/Laptop and AIB GPU gaming discrete gaming card/Games ecosystem market is concerned. It's very much like the isle end cap at the supermarket with those cereal/other consumer product makers wanting more sales and paying the supermarket/supermarket chain extra for that supermarket isle end-cap product placement space.

Nvidia outspends AMD by several times and money gets anyone's attention. Nvidia sends in its Nvidia "Driver Mafia" teams, as those in the industry call them, into the games/gaming engine development houses and helps get things working. [Note Nvidia's "Driver Mafia" Teams are not to be confused with the Mafia Driver series of games, even though those games may make use of Nvidia's "Driver Mafia" teams also]

"Anyone else noticed that all the games with RTX enable are in sponsor/partnership with Nvidia"

Really you can not be that Daft or you are being seriously disingenuous! But Really Sure Nvidia has to spend money to pay for that development otherwise the poor little gaming company script kiddys are no going to be able to do their little jobs. The Games/Gaming engine companies are sure not going to foot the bills to make there games compliant. Games/Gaming engine Companies what to program only for that cross platform standardized features only.

Look at M$'s new DXR and Nvidia helped there also, just like AMD helped with Mantle to become the basis for DX12 and Vulkan. Nvidia's done some nefarious things via its Gameworks but Nvidia is so damn dependent on GPU's unlike AMD and Intel. Nvidia's going to risk fines juts like Intel does and both Nvidia/Intel will continue with the nefarious tactics because the small fines are just written off as the cost of doing business.

AMD's Vega Explicit Primitive Shaders are there But the money is not there for AMD to spend to pay the games/gaming engine developers to adopt that technology quickly enough. That Implicit Primitive Shaders code path for automatically converting Legacy Gaming Code on the fly to make use of Vega's in hardware Explicit Primitive Shader IP proved too costly an undertaking for AMD at that time.

But do not Worry Vega's Rapid Packed Math is proven popular with games/gaming engine makers and that was very easy IP to adopt compared to Vega's in hardware Explicit Primitive Shaders. But do not worry Navi will have all the features of Vega and then some so that Explicit Primitive shader IP will eventually be used, probably first by the console gaming industry.

AMD without enough funds of its own to pay its way into GPU Hardware/driver feature adoption can still allow the console gaming industry to spend the money is AMD's place to make use of Explicit Primitive Shaders and Rapid Packed math. Everything cost money and money does not grow on trees!

I agree that Nvidias current implementation of this Tech is designed to gain market share. However, early demos of this Tech IS NOT promising in the slightest, regardless of visual fidelity (that BF:V demo is gorgeous). With Ray Tracing enabled the Turing card wasn't able to hit 60 frames at 1080p. That's not good. Not good at all.

Currently, adaptive sync and high refresh rate panels are THE best experience in gaming. Its objectively better than any other solution currently available. If these cards are struggling to hit 60 frames at 1080p, then their improvements in visual fidelity are moot.

The tech demo was run on engineering samples with early drivers, so hopefully performance has improved since then. However, I have a feeling that history, and hopefully AMD, will make this generation of GeForce cards look very similar to Intels 7700k, from a price/performance standpoint.

If anything I'd say them trying to bring raytracing into the 'gamer' market is a sign that they bring what had become two (maybe three) different uarch's back into a single uarch.

You're right when you say Nvidia split what used to be a single uarch into two segments (maybe three if we include tensor 'cores'), however i disagree that they'll maintain two separate uarch's going forward.

What they want to do is make each market segment relevant to the others so they can go back to a single uarch that covers all markets, RTX is an attempt to combine the tensor 'cores' of Volta (machine learning) with the compute capabilities of Quadro (floating point performance (the actual raytracing)), and the pixel fillrate of Geforce.

"If anything I'd say them trying to bring raytracing into the 'gamer' market is a sign that they bring what had become two (maybe three) different uarch's back into a single uarch." [No Not Exactly]

Do you realise how many different MicroArchs have beeen used on GPUs for like FOREVER. Look at the Controllers on GPUs and they are not x86 based, Nvidia uses RISC-V based controllers on some of its GPU ON-DIE video/other decoding/coding functional blocks. Most Memory controllers on CPUs and GPUs are Microprocessors in their own right.

Nvidia's RT cores, how much different are they than AMD's GCN nCU based ACE units that can compute Rays also, Ray Tracing workloads being among some of the most compute intenstive workloads that was traditionally done on CPU cores. What About IT's PowerVR Wizard On GPU Ray Tracing Hardware. Well IT did not have enough money to pay their way into enough design wins to survive and had to be bought out and had its MIPS IP sold of to a different company.

Who Knows who may have licensed IT's Wizard Ray Tracing hardware IP, but that IP is still there also.

Really GPUs are rather complex heterogeneous pocessing beasts with their TMUs and ROPs and tessellation units and who knows what specilized ISAs that they make use of. The ISA of the Shader Cores is usually what is discussed but that may be micro-code based also whith new insturctions able to be added with just a firmware Update. The are a lot of very specilized processors that as a whole make up what is known as a GPU, and even for CPUs there are also specilized processors in there that folks do not know about.

Nidias RT cores are in need of some very intensive whitpaper treatment from Nvidia so that IP can be properly looked at and vetted by the independent testing labs. Both AMD'a and Nvidia's older GPU SKUs can accelerate Rays also and really it's the Tensor Core based Trained Denoising AI running on Turing's Tensor Cores that is the special sauce that allows for that poor quality Ray Tracing output from Turing's RT cores to be of any usage for High FPS real time gaming.

God are you still around, i thought you'd moved on to annoy someone else.

1) no one said anything about "different MicroArchs", i mentioned Nvidia using different uarch.

2) Nvidia's RT 'cores' are different than AMD's streaming multiprocessors, it doesn't matter how much that difference is just that they are different.

3) GPU's are only rather complex heterogeneous pocessing beasts if you're to stupid to understand them, they're just a collection of ASIC's.

4) We don't need a very intensive whitpaper to look at, vet, or understand how GPU manufacturers intellectual property works, you just need a degree of intelligence and a little bit of understanding, something you obviously lack going on what you've said.

You are full of it, as ray Tracig is a compute task so Nvidia's going to have to prove, with whitepapers, that Nvidia's "RT" cores are substantially different from AMD's GCN cores with their ACE(Asynchronous Compute Engines).

Nvidia's RT cores do not project sufficient numbers of completed and fully cast rays and only produce a grainy result that requires that AI processing that needs Turing's AI acclerated Denoising done on the pre-trained Tensor Cores. It's The Tensor Core hosted Pre-Trained AI that's making the insufficient Ray Tracing output usable for high FPS gaming.

"it doesn't matter how much that difference is just that they are different." Really you are a proven moron with that single statment!

Really the Nvidia fanboys are all complaining that Nvidia has not done enough to increase FPS, like they only care about it being the fastest sort of NASCAR race sort of nonsense. And Likewise The AMD Fanboys are clueless also because they are just as moronic!

But real technology companies produce real whitepapers, if they are not up to some nefarious snake oil tactics!

Nvidia is going to have to make with the detailed whitepapers just like AMD is in need of more whitepaper production. There is no getting around that fact that the scientific and academic journals expect prodigious amounts of peer reviewed whitepaper content before things can be properly vetted and judged.

Those r/Amd and r/Nvidia morons are so confused lately as they always are when those hicks from the sticks poke their damn dirty noses into subjects that they can not hope to understand without some serious hand holding! And I'll include the city hick variety commonly known as the hiptsters as they are just hicks from the sticks that have settled in the urban areas to make the cities unlivable hellholes of stupidity.

You don't need a whitepaper to prove something, just like you don't need a whitepaper to prove how CUDA 'cores' work, i mean seriously you can't be that dumb that you don't know how silicon calculates basic arithmetic.

AMD and Nvidia may call them different names but they're all basically programmable parallel processors and the fact that you don't know that and are parroting Nvidia and AMD's marketing BS shows you don't know what you're talking about.

Instead of harping on about whitepapers how about actually learning the basics of processors so you don't keep making yourself look like an idiot.

ChipWorks(TechInsights Inc.)/Others will definitely be slicing and dicing Nvidia's RT cores and Tensor Cores and using electron microscopes to get at their designs. There are Patents to look out for and clients will pay for that research.

Nvidia's RT cores need that whitepaper treatment as they will be there for any symposia that Nvidia attends with those ever present block diagrams! And no one is parroting any marketing clap trap. No one expects that Nvidia's Limited Real Time Ray Tracing is some panacea and that will have to undergo refinements for sure.

Cuda cores have already gotten their whitepaper treatment and the Turing Cuda Cores have already had some whitepaper treatment ownig to all the improvments that Nvidia has announced. The RT cores are an Unknown even more so than the Tensor Core IP that was already revealed in the Volta whitepapers.

But IP functional Blocks need whitepaper treatment and no one is asking for a direct copy of the verilog and there will be patent applications and Patents awarded to look at also.

Sure you can programm via a black box API with only public function call prototypes and symbols Known but that's not really going to allow much of a look at the hardware for comparsion and contrasting reasons.

They will get at Nvidia's Turing Dies with those electron microscopes and even Xrays to do the patent vetting as is done with everyone's processors.

Some folks with brains and not lard between their ears like you will want some block diagrams and other information to measure one design against the other and it's not all about programming.

You can remain happy in your ignorance but others will want some hardware deep dives via the published whitepapers.

I already Know the basics so that's no issue going all the way back to the Stack Machine Architecture and I do not give a Flying F---K about gaming. Gamers can contine to whine about paying Nvidia's price scale for it all but gamers will have to suck it up or stay with the previous generation and game that way. Dual Pascal Gaming is going to become more affordable than ever so let them eat dual Pascal until Turing becomes more affordable. Dotto For dual Vega Gaming once Navi is to market.

Both DX12 and Vulkan will allow Milti-GPU ueage without the need for CF/SLI in the drivers as for DX12/Vulkan the drivers are simplified and only expose the GPU's bare metal. It will be up to the games/gaming engine makers to access the GPUs and load balance the GPUs on a system via the DX12/Vulkan APIs.

IT's PowerVR Wizard was the first GPU that made use of in hardware ray tracing functional blocks so I'd like to see more comparsions to that IP also. AMD's is becoming flush with cash lately so maybe Tensor Cores and some Ray Tracing IP will begin showing up there also over the next few years. But AMD has its Epyc sales to bring in the mad revenue streams until it decides where its GPU roadmap will begin to include Tensor Cores and dedicated Ray Tracing functionality.

Those GCN ACE units are probably already there for Vega/nCUs for any Sorts of Ray Tracing workloads but the big question for AMD there is at what latency compared to Nvidia's RT cores for Real Time gaming workloads. And Nvidia grainy Ray Tracing output is only made useable after the Trained AI running on Turing's Tensor cores work their AI denoising task.

Do you realise how many different MicroArchs have beeen used on GPUs for like FOREVER. Look at the Controllers on GPUs and they are not x86 based, Nvidia uses RISC-V based controllers on some of its GPU ON-DIE video/other decoding/coding functional blocks. Most Memory controllers on CPUs and GPUs are Microprocessors in their own right.

Nvidia's RT cores, how much different are they than AMD's GCN nCU based ACE units that can compute Rays also, Ray Tracing workloads being among some of the most compute intenstive workloads that was traditionally done on CPU cores. What About IT's PowerVR Wizard On GPU Ray Tracing Hardware. Well IT did not have enough money to pay their way into enough design wins to survive and had to be bought out and had its MIPS IP sold of to a different company.

Who Knows who may have licensed IT's Wizard Ray Tracing hardware IP, but that IP is still there also.

Really GPUs are rather complex heterogeneous pocessing beasts with their TMUs and ROPs and tessellation units and who knows what specilized ISAs that they make use of. The ISA of the Shader Cores is usually what is discussed but that may be micro-code based also whith new insturctions able to be added with just a firmware Update. The are a lot of very specilized processors that as a whole make up what is known as a GPU, and even for CPUs there are also specilized processors in there that folks do not know about.https://www.printersrepairnearme.com/epson-printer-repair

Nidias RT cores are in need of some very intensive whitpaper treatment from Nvidia so that IP can be properly looked at and vetted by the independent testing labs. Both AMD'a and Nvidia's older GPU SKUs can accelerate Rays also and really it's the Tensor Core based Trained Denoising AI running on Turing's Tensor Cores that is the special sauce that allows for that poor quality Ray Tracing output from Turing's RT cores to be of any usage for High FPS real time gaming.

As far as the future of 64-bit vs. 32 or even 16-bit floating point, it seems like they would use multi-precision units eventually. I have seen some articles about multi-precision units that support something like 1 64-bit operation or 2 32-bit operations or perhaps even 4 16-bit operations. This is more complex, but takes less hardware than having separate units for each precision.

It is unclear how turning is actually configured. Some people have said that there isnt't actually any separate ray tracing hardware, which would mean that it is really just a software API, and possibly not a very good one. I don't know how they will segment the market to get more money out of the pro series cards though. They can always limit things in drivers, or leave out certain features. I work at a company doing gpu acceleraton and we use consumer cards for testing, but customer systems are the pro cards with ECC.

For ray tracing, as far as I know, the issue hasn't really been available compute resources. We have pretty massive compute resources in recent generations. The problem is that they are made for quickly going through massive amounts of data with good spatial locality. Raster graphics is mostly a lot of streaming operations. With ray tracing, you don't know where the rays will go, so you need random access to the entire scene, including the geometry and materials information. Even if you are just processing a small view, you still need that random access to the entire scene data, so splitting into smaller chunks like tile based rendering for raster graphics isn't as effective. GPUs are not that great at random access operations. Memory latency is hard problem to solve. I don't see much use for nvidia ray tracing (whatever it is) right now. Ray tracing is just hard to accelerate. I assume similar things are already being done on GPUs for accurate shadows and some lighting effects tacked on top of rasterized scenes.

Nvidia needs something to drive sales of more powerful gpus though, so I suspect the ray tracing stuff is mostly marketing and not much else. We need more power for 4K resolution, but for those still at 1080p, is the cost of a new graphics card justified if you already have a relatively powerful card? Probably not at the moment. The consoles are actually going to be under more performance pressure than PC gaming just because more people have 4k tvs than 4k computer displays. The market of people with 4K computer displays is tiny compared to the number of 4k tvs out there. They need 4k at 60 Hz, and it needs to be cheap.

I expect that the next generation consoles will be multi-chip solutions. AMD is obviously trying to prepare for multi-chip solutions in several ways. I suspect Nvidia will try and hold back. They will stay single chip as long as possible since this will make it such that developers are less likely to do mutli-chip optimizations. That will give them an edge over AMD for a while. GPUs unfortunately still require more specific optimizations to perform well than CPUs. The next generation consoles need to run 4k 60, and that may not be cost effective with a single large chip. The newest process nodes will be very expensive from the start (EUV processes are ridiculous) and yields will probably be quite low. For a given design size, the switch over point between one big chip and multiple smaller chips will be comming down. It depends on how the process tech performs. Relatively large chips seem to be doable on 12 nm, but it is probably still quite expensive and there is probably a lot of slavage or completely defective parts.