Veteran

Linked-list based methods for OIT are not robust, are generally slow and eat tons of memory. Take as an example this slide form the GDC TressFX presentation:

There is a reason developers don't tend to use these techniques despite HW that can support them has been available for many years now. BTW, Andrew is right, even if you want to stick to per-pixel lists it makes little or no sense to sort the fragments as in the original mecha demo, it's generally faster and guarantees more stable performance to use our adaptive transparency algorithm to composite all the fragments stored in the per-pixel list.

Also the mutex mentioned in the interview doesn't reproduce the behavior of Intel pixel ordering since a mutex cannot guarantee a well defined & deterministic execution order, which is the key that enables building complex per-pixel data structures. Without ordering you are stuck with simpler algorithms (i.e. capture the first N layers and hope for the best) unless you like seeing a bunch of flickering pixels on the screen

ModeratorVeteran

Yeah I'll add that the mutex thing is also *not* safe to do according to the DX11 spec. While it may work on some hardware, you should not be shipping applications that do it, since a full DX11-conforming GPU/implementation could legitimately hang on any code that waits on any other invocation. See my response in this thread for some more info: http://forum.beyond3d.com/showthread.php?t=63686.

Thus the "critical section" implementation would also require an extension in specification at the very least, and you'd still end up with something that is fundamentally slow, uses lots of memory and isn't robust to overflow.

RegularNewcomer

How do you do that ?
ps: tnx andy, and since your here you need to get shogun total war working on that iris and while your at it crimson skies. Dont forsake us retro gamers

Click to expand...

There is a Patch coming for Total Rome, should work then. Crimson Skies is a game bug, you have to ask the Crimson Skies devs in order to get this fixed. AMD/Nvidia suffer from the same issues in this game.

Newcomer

Nick Thibieroz from AMD here. My name is butchered in the article linked but it's really me .

I wanted to make a few points to provide a different perspective than some of the views expressed in this forum.

As I mentioned in the interview, PixelSync is certainly not a requirement for OIT techniques. I don't think anyone would dispute this. OIT techniques have been available for a while, with various trade-offs in performance and quality.

There seems to be a determination to compare a generic DX11 implementation of an OIT solution (Per-Pixel Linked Lists) with functionality only available on a specific piece of hardware. Could we make Per-Pixel Linked Lists (or other OIT-enabling method) faster by exposing certain hardware features and their corresponding IHV-specific extensions? Probably. But is it what game developers want? Most of them would stick to industry standard APIs. This makes the comparison fairly moot in my point of view.

It is worth nothing that Adaptive OIT has visual quality shortcomings (due to its adaptive nature). This can be observed on Intel's own SDK sample. It doesn't take away from the relative merit of this technique, but in comparison a pure PPLL solution guaranteeing individual shading, sorting and blending of K frontmost fragments was a better match for the quality goals Crystal Dynamic had for rendering Lara's iconic hair. And TressFX in Tomb Raider was designed to be a high-end feature. TressFX in Tomb Raider has a significant cost, but I think that the visuals provided by this truly next-generation feature is worth it. And actually the bulk of the performance is not consumed by PPLL memory accesses: physics, AA, lighting, shadowing etc. take the lion's share.

PPLL also require a large amount of memory (200 Megs @1080p with an average overdraw assumption of 8). But to be honest this is exactly the kind of unconventional usage of graphics memory I would expect from next-generation titles, especially with consoles boasting 8 Gigs of unified memory (and GDDR5 in the case of PS4).

There is no doubt in my mind that PPLL are efficient for the quality advances they enable in next-gen features. I played Tomb Raider from start to finish and I maybe saw two split-second instances of missing fragments due to overflow. I call this robust from a game development world perspective (certainly not as common as e.g. clipping issues, blockiness or other similar problems affecting typical games).

As it is relevant to this discussion I'd like to point out that there is a TressFX SDK sample currently available on the AMD Developer website (http://developer.amd.com/tools-and-sdks/graphics-development/amd-radeon-sdk/). It takes less than 3ms in the default view on a 7970 Ghz Edition card, and we plan to release updates to this sample providing additional options for performance/visuals tradeoffs.

As pointed out by nAo a mutex-based solution does not guarantee pixel ordering (my answer mentioned the serialization of access, not ordering). While this functionality can be useful for programmable blending it is not directly relevant to PPLL-style techniques of Order-Independent Transparency since -by definition- all fragments aim to be sorted independently of their ordering. As to whether a mutex solution is "safe" to do or not, we think we can make it so. I cannot comment further on this at the moment.

Andrew talks about misleading statements. There is nothing misleading in my answers to this interview, those are my views. "Misleading" is a matter of opinion, and the latter is obviously heavily affected by our roles in this industry.

Veteran

Yeah I'll add that the mutex thing is also *not* safe to do according to the DX11 spec. While it may work on some hardware, you should not be shipping applications that do it, since a full DX11-conforming GPU/implementation could legitimately hang on any code that waits on any other invocation.

Click to expand...

Andrew is absolutely correct. That kind of coding (waiting for another thread in a different thread block) is not supported by the GPU computation model. If you program like that, the GPU might randomly hang. There's absolutely no good way to test that your algorithm works even for a single brand of hardware, since it might work properly with most data sets, and still crash with others. The problem size varies depending on transparencies in screen, and that varies wildly by the camera location/direction in the game world. It's impossible to test every possible permutation.

I have been thinking about this issue from a different perspective. Because our system renders the whole viewport with a single draw call (and virtualizes both mesh and texture data), the term "Order independent" doesn't really apply anymore, since there's no order of the draw calls (or objects, because whole objects will not be rendered at once). The GPU decides the ordering of the triangles, and the ordering can be really fine grained (since there's no longer a separate vertex buffer and a draw call for each object... just small pages of vertex/texture data).

GPU guarantees that the triangle submit order is kept intact and ROPs blend pixels in submit order (it is fully deterministic). So, if you submit the data to ROPs in the correct order, the result will be correct. Thus if you sort the data properly before you rasterize your triangles, the blending will be correct. For well behaved geometry (no triangles intersect each other), this is workable, assuming the geometry pages contain only convex data, since convex data doesn't need to be sorted against itself. However there might be a dependence loop in overlapping geometry (http://en.wikipedia.org/wiki/Painter's_algorithm), and this cannot be solved just by sorting convex patches (no ordering produces correct result). This system wasn't originally designed to solve the OIT problem, but it seems to work well with transparencies (providing almost perfect result without the need for per pixel lists). However our method is not a general purpose method that could be used to all kinds of rendering, like the Haswell hardware "PixelSync" extension. It requires the whole rendering pipeline to be designed around it (and it has many restrictions and limitations).

ModeratorVeteran

There seems to be a determination to compare a generic DX11 implementation of an OIT solution (Per-Pixel Linked Lists) with functionality only available on a specific piece of hardware. Could we make Per-Pixel Linked Lists (or other OIT-enabling method) faster by exposing certain hardware features and their corresponding IHV-specific extensions? Probably. But is it what game developers want? Most of them would stick to industry standard APIs. This makes the comparison fairly moot in my point of view.

Click to expand...

For a game developer choosing a technique to put in a game, sure. For a future architecture/API discussion, certainly not. A lot of the point in Intel exposing the extensions was to help demonstrate how useful they are and help push the industry to support them more broadly. Programmers have been asking for programmable-blend-like functionality for as long as I've been in graphics, so the reality is that despite the APIs moving quite slowly at the moment, it's important to expose this important feature.

As I mentioned earlier in the thread, I personally have been pushing for standardization of similar techniques for years now, so I'd turn the question around: do you disagree that it's a useful and important feature to enable this and other techniques? Adaptive visibility function compression is just one example... there are many simpler useful cases like blending normals into G-buffers, blending in alternate color spaces, etc. that all require something like programmable blending; there's a reason why game developers want it so badly

It is worth nothing that Adaptive OIT has visual quality shortcomings (due to its adaptive nature). This can be observed on Intel's own SDK sample.

Click to expand...

It's configurable in quality while fixing a performance target, which is what game developers want. They fundamentally need to hit a frame budget, not a quality mark. It's not okay to drop to 5fps just because a thousand hair segments happened to overlap in a frame.

Did you try 8/16-node versions of AOIT (or more)? I'd be surprised if you found AOIT to look worse in any cases where it used even a fraction of the amount of memory that per-pixel linked lists does. Turns out compression works pretty well

And actually the bulk of the performance is not consumed by PPLL memory accesses: physics, AA, lighting, shadowing etc. take the lion's share.

Click to expand...

Yeah don't get me started on the shadowing As far as I can tell, it's not using self-shadowing within the hair (which is unsurprising... you really don't want to brute force that anyways), but yet it's still doing multi-tap PCF for every fragment. That seems like a poor choice vs. generating a shadow mask texture for the hair and simply projecting it...

But to be honest this is exactly the kind of unconventional usage of graphics memory I would expect from next-generation titles, especially with consoles boasting 8 Gigs of unified memory (and GDDR5 in the case of PS4).

Click to expand...

It's not so much the memory footprint as the implied bandwidth that is the issue. The access pattern of PPLL's is really terrible as well, so bandwidth to off-chip memory is heavily amplified. Something like Haswell's eDRAM would help somewhat with this problem, but the underlying access pattern is fundamentally unfriendly.

Andrew talks about misleading statements. There is nothing misleading in my answers to this interview, those are my views. "Misleading" is a matter of opinion, and the latter is obviously heavily affected by our roles in this industry.

Click to expand...

Sure, but some of it is a bit more cut and dry than just personal opinion.

Like for instance, you state "as long as game studios keep pushing the boundaries of realism in real-time 3D graphics, there will always be a market for performance and discrete GPUs." I don't think anyone here would disagree with that.

But then in other parts of the interview, you basically imply that AMD APUs are somehow in a different performance category; for instance: "when it comes to high-end gaming, current Intel integrated graphics solutions usually force users to compromise between quality or performance, which is a tough choice to impose on gamers." The same statement can be made for all integrated graphics... it's sort of obvious - they run in a fraction of the power and thermal budget of the discrete GPUs.

So you're saying that Intel's solutions are not fast enough to be usable, but yet Iris Pro is faster than any of the current APUs that AMD have released as far as I can tell from reviews. So does that mean AMD APUs are magically more usable just because they say AMD on them? Invoking the PS4 is "misleading" since AMD ships nothing of that class of part for PCs currently; the current parts aren't even GCN-based yet, so the comparison is pretty far-fetch IMHO.

I don't think that's the sort of "opinion" that is supported by facts is all. Like I said, I realize that for the purpose of such interviews your need to pull out some marketing stuff and that's fine (sadly that's the way the media world works), but let's just stick to the technical conversation here.

Anyways regardless of our potentially disagreements (which are probably less than you think ), great to have you here Nick!

Legend

but yet Iris Pro is faster than any of the current APUs that AMD have released as far as I can tell from reviews.

Click to expand...

If thats true you need to get that message out there. It surprised me (not that i'm the most knowledgeable guy) and I'm sure many people are still in the mind set that intel gfx are slow compared to amd
I also get your point about Invoking the PS4 is "misleading

The Playstation 4 boasts a powerful semi-custom AMD APU and we will see a major improvement in console and PC game graphics quality as a result.

Click to expand...

Graphics in the ps4 will be handled by a discreet gpu not the integrated (going to look silly if im wrong on this) so yes it is misleading

ps: nick -- thanks for turning up but fonts size fer christ sake ;D
pps: you make tress fx sound very rosy but it was a big hit on fps and thats with only a single character supporting (a quick google I saw between 30 and 50%) it plus sometimes the hair would clip through her shoulder blades and make her look like she had hairy armpits

ModeratorVeteran

Graphics in the ps4 will be handled by a discreet gpu not the integrated (going to look silly if im wrong on this) so yes it is misleading

Click to expand...

Yes it is a "big APU" but it's much more powerful on the graphics front than anything AMD are currently shipping (not to imply that the PS4 is currently shipping either). The whole discrete vs integrated thing is becoming less meaningful than power budgets anyways.

My point is really just that if the assertion is that "Intel doesn't make halo discrete GPUs, therefore you should buy AMD even if you're buying midrange stuff" were true than no one should have bought any AMD CPUs for many years I kid of course, but I hope I've made my point about how absurd the logic is, particularly for laptops where discrete just increasingly doesn't make sense.

VeteranSubscriber

Yes it is a "big APU" but it's much more powerful on the graphics front than anything AMD are currently shipping (not to imply that the PS4 is currently shipping either). The whole discrete vs integrated thing is becoming less meaningful than power budgets anyways.

My point is really just that if the assertion is that "Intel doesn't make halo discrete GPUs, therefore you should buy AMD even if you're buying midrange stuff" were true than no one should have bought any AMD CPUs for many years I kid of course, but I hope I've made my point about how absurd the logic is, particularly for laptops where discrete just increasingly doesn't make sense.

Click to expand...

Then again, when buying midrange stuff, you certainly do not get Iris Pro.

Veteran

but yet Iris Pro is faster than any of the current APUs that AMD have released as far as I can tell from reviews. So does that mean AMD APUs are magically more usable just because they say AMD on them?

Click to expand...

First of all, you can't buy that yet. Secondly I don't think it will be faster with AA enabled.Thirdly Intel driver support is nowhere near AMD , either in technical support for games or image quality and options. And that probably is not subject to change in the near future. I also seriously doubt Intel's IQ is the same as AMD's or NVIDIA's,they have to be cutting corners somewhere .

Veteran

But is it what game developers want? Most of them would stick to industry standard APIs. This makes the comparison fairly moot in my point of view.

Click to expand...

All HW vendors have been supporting various forms of extensions since DX9 so I'd say yes, when an extension is useful it gets adopted by developers and sometimes it makes into the next revision of the API.

As pointed out by nAo a mutex-based solution does not guarantee pixel ordering (my answer mentioned the serialization of access, not ordering). While this functionality can be useful for programmable blending it is not directly relevant to PPLL-style techniques of Order-Independent Transparency since -by definition- all fragments aim to be sorted independently of their ordering.

Click to expand...

Not sure how much a mutex can be useful for programmable blending, unless the "programmable blending operation" is truly order independent (e.g. keep the first N layers, etc.)

As to whether a mutex solution is "safe" to do or not, we think we can make it so. I cannot comment further on this at the moment.

Click to expand...

I don't doubt for a second it can be made safe, what I doubt is that it can be made safe on any past, present and future DX11+ GPU. At that point it just becomes a proprietary extension but I thought you were arguing developers don't like that

About Us

Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!