RegularNewcomer

I guess they'll work on Pascal DXR perfs too. Because even if they work on it, and the perfs are "ok-ish", It would be still lagging behind turing by a large margin I guess, due to the lack of RT hardware, I guess. And they can't let Vega having better DXR perfs than Pascal...

LegendVeteranSubscriber

So now all reviews are going out the 19th with many reviewers only getting their card for a day for testing. As suggested earlier, it would be great if we could get past this day 1 for views and trust their readership to spend more time doing more detailed extensive testing. Or simply have the inevitable suite of benchmark runs only at limited settings for day 1 then consumers actually wait for additional articles (which generate more views anyway) before deciding on purchasing.

Of course Nvidia are great at generating hype and have created an early Apple-like insatiable need to buy their products before it runs out! While stocks last folks!

It will be interesting to see how the RTX is received though considering the very high price tag and focus on new features that have zero immediate use.

VeteranRegularSubscriber

So the question I have: are GPU shader cores superscalar already or the Turing ones were made superscalar (first ones?)?

This would imply that a significant chunk of work done by the shader core consists of integer ops. Performance could be squeezed out by the compiler by arranging the compiled shader code to effectively scheduler for this.

I’m wondering if Turing inherits Volta’s approach on thread scheduling in a Warp. With Volta having each thread with its own execution state, the GPU can switch back and forth between thread groups whenever it needs to in order to maximize efficiency. Then the GPU program can sync up later to have all the threads work in parallel. Do we know yet if this is the same for Turing?
Thanks

Legend

So the question I have: are GPU shader cores superscalar already or the Turing ones were made superscalar (first ones?)?

This would imply that a significant chunk of work done by the shader core consists of integer ops. Performance could be squeezed out by the compiler by arranging the compiled shader code to effectively scheduler for this.

Click to expand...

Kepler was superscalar but I don’t think that definition applies to Turing. Based on what nvidia has shared so far Volta/Turing can’t issue multiple instructions per clock from the same warp.

The execution rate is half the issue rate though which does allow for both INT and FP pipelines to be occupied concurrently.

So, the new architecture allows to run FP32 instructions at full rate, and use remaining 50% of issue slots to execute all other type of istructions - INT32 for index/cycle calculations, load/store, branches, SFU, FP64 and so on. And unlike Maxwell/Pascal, full GPU utilization doesn't need to pack pairs of coissued instructions into the same thread - each next cycle can execute instructions from differemnt thread, so one thread perfroming series of FP32 instructions and other thread perfroming series of INT32 instructions, will load both blocks by 100%

Click to expand...

Which gets the reply “This is correct.”

It doesn’t say explicitly that the instruction have to come from different warps. I always assumed it wasn’t necessary, but that may be simply wrong.

Either way: since the FP32 and INT are issued in an alternating way, I don’t think you can call it superscalar?

Legend

It doesn’t say explicitly that the instruction have to come from different warps. I always assumed it wasn’t necessary, but that may be simply wrong.

Either way: since the FP32 and INT are issued in an alternating way, I don’t think you can call it superscalar?

Click to expand...

Yeah, it doesn’t have to come from other warps but it helps with utilization.

Kepler is superscalar - has multiple warp schedulers and dispatchers per SM and could issue multiple instructions per clock from one or more warps.

Maxwell and Pascal are also superscalar - dropped the extra schedulers but still had multiple dispatchers so could issue two instructions per clock from the same warp (needing ILP). Like Kepler, the FP and INT pipelines were a single unit though so couldn’t be used concurrently.

Volta/Turing dropped the extra dispatcher but split the FP and INT pipelines. They also cut the number of execution pipes in half. No more superscalar but now they can keep both the 16-wide INT and 16-wide FP pipelines busy with a single dispatcher. End result is major gains for mixed FP/INT workloads. How much that actually matters for games should become clearer next week.

VeteranSubscriber

You probably won't be able to see independently verified numbers there, because you would have to be able to switch either concurrent FP/INT execution off or all the other things "Turing" in order to see how much this exact feature alone helps in games. I'd love to be proven wrong though!

About Us

Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!