//bind the first fbo
//bind the first subroutine
glDrawBuffers(3, drawBuffers1);
...
//unbind the first fbo
//bind the second fbo
//bind the second subroutine
glDrawBuffers(3, drawBuffers2);
...
//unbind the second fbo
...

The above code works, but I wonder if it is the only right way to use subroutines together
with multiple render targets. Is something which I can do better (more efficient) ?

Do I have to use (location = 0) for the default framebuffer output ?

When I first bind the second fbo and subroutine and next the first fbo and subroutine glDrawBuffers
clears all textures. What can I do about it ?

If you have multiple color outputs then you have to write a value to all of them, otherwise the ones that you don't write any value to become undefined.
Think about it this way: if you don't output any color to a particular color output, a write to that color output still will happen, just an implementation dependent value value may be written there.

Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
Technical Blog: http://www.rastergrid.com/blog/

First, why would you like to use the same shader for all of these? I think it's fine if you switch shaders 3 times in a frame. You try to be overzealous on batching things together.

In fact, for the shadow map rendering stage you don't even need a fragment shader, thus if you have one, that will actually cost you a lot in performance.

Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
Technical Blog: http://www.rastergrid.com/blog/

You don't need a fragment shader for depth-only rendering, not even in core profile.

Originally Posted by Alfonse Reinheart

The fragment depth is part of the results of fragment shader execution. And thus it is undefined. And having an empty fragment shader doesn't lose you any performance.

The fragment depth output by a fragment shader is, of course, makes no sense without a fragment shader, but you don't need a fragment shader to output a depth, you have a fixed function one.

Also, having an empty fragment shader DOES cost your performance. Depth testing and depth write is done by fixed function hardware which can have a throughput of many pixels per clock, especially with hierarchical Z, not to mention that no shader cores are needed to be used. While if you have a fragment shader you'll have to launch shaders on your shader engines and even if they do nothing, it will still cost you several clocks per tile (unless the driver is smart enough to just ignore your empty fragment shader in which case you'll get the same results).

Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
Technical Blog: http://www.rastergrid.com/blog/

The fragment depth output by a fragment shader is, of course, makes no sense without a fragment shader, but you don't need a fragment shader to output a depth, you have a fixed function one.

If you're right, please point to the part of the OpenGL 4.3 specification that states that the input to the depth comparison does not have to come from the fragment shader. Because it clearly says:

Originally Posted by GL 4.3 Ch 15

The processed fragments resulting from fragment shader execution are then further processed and written written to the framebuffer as described in chapter 17

Those "fragments resulting from fragment shader execution" contain undefined data, as previously stated. If you're right, you should be able to show me where the resulting fragments will get defined data from.

Also, having an empty fragment shader DOES cost your performance. Depth testing and depth write is done by fixed function hardware which can have a throughput of many pixels per clock, especially with hierarchical Z, not to mention that no shader cores are needed to be used. While if you have a fragment shader you'll have to launch shaders on your shader engines and even if they do nothing, it will still cost you several clocks per tile (unless the driver is smart enough to just ignore your empty fragment shader in which case you'll get the same results).

You're making some pretty big assumptions here. Like the assumption that fragment processing can be skipped by the hardware at all. That it can copy data directly from the rasterizer to the ROPs without some kind of per-fragment shader happening to intervene.

I'd like to see proof that this is true. Preferably in the form of performance tests on the difference between an empty fragment shader and not having one. On multiple different kinds of hardware.

If you're right, please point to the part of the OpenGL 4.3 specification that states that the input to the depth comparison does not have to come from the fragment shader.

The description of per-fragment operations starts as follows:

Originally Posted by GL 4.3 Ch 17.3

A fragment produced by rasterization with window coordinates of (xw; yw) modiﬁes the pixel in the framebuffer at that location based on a number of parameters and conditions. We describe these modiﬁcations and tests, diagrammed in ﬁgure 17.1, in the order in which they are performed. Figure 17.1 diagrams these modiﬁcations and tests.

Thus you can see that fragments are produced by the rasterization, not by fragment shaders.

Also, what chapter 15 tells is actually:

Originally Posted by GL 4.3 Ch 15

When the program object currently in use for the fragment stage (see section 7.3) includes a fragment shader, its shader is considered active and is used to process fragments resulting from rasterization (see section 14).If the current fragment stage program object has no fragment shader, or no fragment program object is current for the fragment stage, the results of fragment shader execution are undeﬁned.
The processed fragments resulting from fragment shader execution are then further processed and written written to the framebuffer as described in chapter 17.

Thus despite the results of fragment shader execution are undefined, most data required for per-fragment operations is not affected by the fragment shader, namely pixel ownership and scissor test, multisample operations, depth and stencil test (unless depth or stencil export is used, which is obviously not the case if there is no fragment shader) and occlusion query.

Originally Posted by Alfonse Reinheart

Those "fragments resulting from fragment shader execution" contain undefined data, as previously stated. If you're right, you should be able to show me where the resulting fragments will get defined data from.

Once again, the results of fragment shader execution are undefined, not the fragments. By default the results of fragment shader execution are the color outputs, unless an other explicit mechanism is used like depth or stencil export.

Originally Posted by Alfonse Reinheart

You're making some pretty big assumptions here. Like the assumption that fragment processing can be skipped by the hardware at all. That it can copy data directly from the rasterizer to the ROPs without some kind of per-fragment shader happening to intervene.

ROPs deal with blending, sRGB conversion and logic op. Obviously those will get undefined data, thus you cannot expect anything good to be in your color buffers after all. But depth/stencil is not handled by the same piece of hardware. Neither are e.g. scissor and pixel ownership tests.

Originally Posted by Alfonse Reinheart

I'd like to see proof that this is true. Preferably in the form of performance tests on the difference between an empty fragment shader and not having one. On multiple different kinds of hardware.

The fact that a lack of a fragment shader doesn't result in an error, but just the results of its execution are undefined is already a good enough reason, I believe. Don't you think it wasn't an oversight from the ARB but they did define it this way intentionally?

Not to mention that in most cases the depth and stencil test happens before the fragment shader is executed (if there is one), these are the so called "early tests" now even explicitly mentioned in the spec, and if the early tests fail then no fragment shaders are executed even if there is one. So if you think hardware cannot avoid the execution of fragment shaders then why do you think the ARB cared writing about it in the spec or why they introduced a mechanism to force/disable early depth in the fragment shader?

Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
Technical Blog: http://www.rastergrid.com/blog/

Thus you can see that fragments are produced by the rasterization, not by fragment shaders.

So how does `discard` work?

Fragments are produced by the rasterizer, and modified by the fragment shader. Just like vertices are produced by Vertex Specification and modified by the vertex shader. Later stages work based on the vertices output by the vertex shader, just as later stages work based on fragments output by the fragment shader.

Originally Posted by aqnuep

Thus despite the results of fragment shader execution are undefined, most data required for per-fragment operations is not affected by the fragment shader, namely pixel ownership and scissor test, multisample operations, depth and stencil test (unless depth or stencil export is used, which is obviously not the case if there is no fragment shader) and occlusion query.

The fragment shader does not output the X or Y position. It does output the depth value, and therefore it will output an undefined value. You seem to have missed this important part of what you quoted:

Originally Posted by The Spec

The processed fragments resulting from fragment shader execution are then further processed and written written to the framebuffer as described in chapter 17.

"The processed fragments resulting from fragment shader execution" have undefined data. The depth output from the fragment shader is part of that fragment. And it has an undefined value.

The fact that a lack of a fragment shader doesn't result in an error, but just the results of its execution are undefined is already a good enough reason, I believe. Don't you think it wasn't an oversight from the ARB but they did define it this way intentionally?

How does that prove anything? Lack of a vertex shader also doesn't produce an error, but there's almost nothing useful you can do with that.

these are the so called "early tests" now even explicitly mentioned in the spec

Ahem:

Originally Posted by GL 4.3, 14.9

The other operations are performed if and only if early fragment tests are en-abled in the active fragment shader

They are explicitly mentioned solely for the Image Load/Store feature of being able to force early tests so that you can get more guaranteed behavior. And you need a fragment shader to activate it.

So if you think hardware cannot avoid the execution of fragment shaders then why do you think the ARB cared writing about it in the spec or why they introduced a mechanism to force/disable early depth in the fragment shader?

You seem to be misunderstanding the difference between "discarding the fragment before the fragment shader" and "processing the fragment without a fragment shader and getting defined results". The latter is what you're alleging that OpenGL allows; the former is what OpenGL actually allows.

Also, you haven't provided any evidence that not providing a fragment shader is faster in any way than providing an empty one. Which is what you claimed and what I asked you to provide.

Well, guess what, if discard is used by the shader then it is very likely that early depth/stencil tests are disabled automatically, because otherwise it might not result in proper results, yeah? Or actually the early tests still can happen, but the depth/stencil writes cannot as they might get discarded.

Originally Posted by Alfonse Reinheart

Fragments are produced by the rasterizer, and modified by the fragment shader.

This is the important part, they are modified only.

Originally Posted by Alfonse Reinheart

It does output the depth value, and therefore it will output an undefined value.

It just optionally outputs a depth. Just because these are all transparent from the user's point of view, it doesn't mean it won't matter. If you output depth in your fragment shader, once again, those early depth/stencil tests will be disabled, unless you force early tests using the functionality introduced by ARB_shader_image_load_store or you use ARB_conservative_depth properly.

Lack of vertex shader DOES produce an INVALID_OPERATION error at draw time in core profile.

Originally Posted by Alfonse Reinheart

Also, you haven't provided any evidence that not providing a fragment shader is faster in any way than providing an empty one. Which is what you claimed and what I asked you to provide.

Come on, if you don't have to run fragment shaders on the shader cores, more vertex shaders can be in flight at once. How wouldn't be that faster? Think about it.

You get defined results for depth and stencil, even without a fragment shader. The spec is unfortunately pretty vague on this, but you can try it out anytime if you don't believe me. Just create a core profile context, setup a vertex shader-only program, set draw buffers to none, attach a depth texture to your framebuffer and let it go. I bet you it will work.

Also, if you don't believe in driver behavior, you can anytime ask the vendors on their opinion, they are the ARB, they can tell it for sure. I'm not willing to continue arguing about facts.

Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
Technical Blog: http://www.rastergrid.com/blog/