I'm currently facing a really weird problem within a fragment shader.
I calculate the light emitted from two kind-of neonlights (so they're not a "real" lightsource of opengl)
I'm passing the needed information to the vertex shader by using attributes.
when I calculate only the lighting of one of the neon-lights, then everything's fine.
when I calculate both, the shader hangs and I have to kill the application.
when I calculate both, but do not use the second result (means: the calculated variable is NOT used anywhere) it does not hang but mess up the result.
how is it possible, that it influences the result ?

can it be a problem with too many varying variables ? I use quite a few, but I get no warnings from the compiler.

thanks for answers in advance.

Humus

08-23-2005, 06:51 AM

Not sure about messed up results, but the "hang" sounds like it could just be that it went into software.

GLSL compiler is smart enough to remove unused calculations. So.. even if you calculate something in shader and you don't use it, compiler will remove such code.
Maybe you can post shader code here and hw specification (OS, gfx card, driver version). It might be a driver bug.

yooyo

styx

08-23-2005, 09:42 AM

o.k, here goes the fragment shader code.
what I have is a spline that acts as a neon light.
all the vectors, vertices and stuff that make up the light is put into those varyings like base, dir, etc...

the next part calculates the distance of the fragment to a line segment of the glowing strip.
many preprocessing done in the vertex shader.
dist and dist1 are then simply a linear intensity value of the light.
if I remove one of the two calculations, everythings fine.

Im not expert for ATI, but it looks like you hit hardware limits in number of varyings. Try to put fogZ, g, g1 into one vec3.

yooyo

styx

08-24-2005, 12:37 AM

I already tried that, didn't help :-/

yooyo

08-24-2005, 09:09 AM

I just check my NV-6800GT. It can interpolate up to 32 floats (8 * vec4).

If speed is not important, try to squeeze extra varyings by interpolating N, L and P as vec4 and use its .w for fogZ, g and g1.
Then, try to change from vec3 to vec4 base, dir and start, and use it's .w for base1.xyz.

After this squeezing it will end up with 8 varyings. Code may work in hw but you have to deal with unpacking.

btw.. This squeezing is not good for performances.

yooyo

kingjosh

08-24-2005, 10:00 AM

I just check my NV-6800GT. It can interpolate up to 32 floats (8 * vec4).

If speed is not important, try to squeeze extra varyings by interpolating N, L and P as vec4 and use its .w for fogZ, g and g1.
Then, try to change from vec3 to vec4 base, dir and start, and use it's .w for base1.xyz.No! What good is a high level language if you're stuck doing work for the compiler all the time?? According to the GLSL Specification, putting three floats into a vec3 should make no difference. I'm not going to quote the spec here, but it's on page 83 in the last paragraph (Section 2.15.3 - Shader Variables). This is also the case for uniform variables.
You might have to do this on some cards, but it would be against the specification to report one number (for example ) and not allow for 16 vec2s, 16 floats plus 8 vec2s, etc. It's not natural to pack components of multiple unrelated varyables into one just so it can fit.
GLSL is a high level language and should be treated as such. Developers shouldn't have to do work that the compilers should be doing. In fact, the compiler should be able to optimize naturally written GLSL code to each implementer's hardware. If certain hardware can only interpolate vec4 varyings, they should pack them behind the scenes in the driver, not force developers to code in an unnatural way!
I'm not an expert on ATI hardware, hopefully they do follow the spec on this issue. Your shader has 30 varyings as written, and thus this shouldn't be the problem.

yooyo

08-24-2005, 12:21 PM

@kingjosh:

You are right, but unfortunatly comilers may fail in this case. I just suggest to try to squeeze varyings. If it work... the blame driver developers. If not... well, I just give a shot.

yooyo

al_bob

08-24-2005, 03:46 PM

No! What good is a high level language if you're stuck doing work for the compiler all the time?? According to the GLSL Specification, putting three floats into a vec3 should make no difference. I'm not going to quote the spec here, but it's on page 83 in the last paragraph (Section 2.15.3 - Shader Variables). This is also the case for uniform variables.You should probably reread the spec.

OpenGL Specification 2.0 - Section 2.15.3

When an attribute variable declared as a float, vec2, vec3 or vec4 is bound
to a generic attribute index i, its value(s) are taken from the x, (x, y), (x, y, z), or
(x, y, z,w) components, respectively, of the generic attribute i.The compiler thus cannot put a scalar varying in the w component (or any other unused component) of an attribute.

V-man

08-24-2005, 06:46 PM

You lose some performance if the compiler does merging of varyings due to swizzling costing an instruction, so the compiler decides not to do it for you.

Humus

08-24-2005, 10:58 PM

Originally posted by kingjosh:
You might have to do this on some cards, but it would be against the specification to report one number (for example ) and not allow for 16 vec2s, 16 floats plus 8 vec2s, etc.There's nothing saying it must run in hardware. It must work, that's the only thing guaranteed. But if the compiler can't take care of it, it may run in software, and if that's an issue for you, you may have to work around it, regardless of all these "should"s.

Humus

08-24-2005, 10:59 PM

Originally posted by al_bob:
You should probably reread the spec.

OpenGL Specification 2.0 - Section 2.15.3

When an attribute variable declared as a float, vec2, vec3 or vec4 is bound
to a generic attribute index i, its value(s) are taken from the x, (x, y), (x, y, z), or
(x, y, z,w) components, respectively, of the generic attribute i.The compiler thus cannot put a scalar varying in the w component (or any other unused component) of an attribute.Don't confuse attributes and varyings.

Humus

08-24-2005, 11:03 PM

As for how to get it to run in hardware, as mentioned, try packing varyings together. If any of the varyings can be mapped in the [0, 1] range, or if it can be packed in that range, you can probably use gl_Color and gl_SecondaryColor. In the worst case, perhaps some of those varyings can be computed in the fragment shader instead of the vertex shader.

kingjosh

08-25-2005, 09:03 AM

You should probably reread the spec.Uh, you should probably start reading it at the beginning. I believe you'll find sections 4.3.4 and 4.3.6 of particular interest.

Humus, why can't the compiler pack these behind the scenes if that's what is required to run in hardware?

If any of the varyings can be mapped in the [0, 1] range, or if it can be packed in that range, you can probably use gl_Color and gl_SecondaryColor.Please, do not do this. This type of coding degrades the integrity of GLSL (http://en.wikipedia.org/wiki/GLSL) . Wait for better compilers or get a card that can handle more varyings.

Humus

08-25-2005, 05:47 PM

Originally posted by kingjosh:
Humus, why can't the compiler pack these behind the scenes if that's what is required to run in hardware?Dunno. Probably just not implemented.

Originally posted by kingjosh:
Please, do not do this. This type of coding degrades the integrity of GLSL (http://en.wikipedia.org/wiki/GLSL) . Wait for better compilers or get a card that can handle more varyings.The integrity of GLSL, what the heck is that even supposed to mean? It's not forbidden to write semi-ugly GLSL code. You sound a bit like those datalogical academics who live in a world where everything is object oriented, no class data is public, everything is in hungarian notation, no function names contain abbreviations, compilers automagically produces optimal code regardless of input, and there's capital punishment on the use of goto. I on the other hand live in the real world, where hardware has limited capabilities and compilers are still just a piece of software that simply can't optimally map all the infinite number of combination of statements to hardware. If the use of goto gives me a significant speedup on x86 in a critical piece of code, I'll go for it. If gl_Color allows me to use another vec4 varying, despite the data passed not actually being a color as implied by the name, I'd go for it. When the compiler takes care of the situation better it can be cleaned up if needed.

Frankly, I don't think lessons on how to write nice GLSL code is what styx came here for. I think he primarily wants to get his code running in hardware. GLSL may be a high level language, but that doesn't mean you can just forget that there's hardware under the hood. Just like in C++, if you write code that's close to the hardware, you'll achieve better performance. Laying out things more explicitely can often result in better performance. I don't consider that bad practice at all. In fact, I consider it to be good practice. If you have a scale and bias, it's not bad practice to put that in a vec2 as opposed to two floats. This way it's more likely that it ends up in the same constant register, and thus will likely run faster.

kingjosh

08-29-2005, 09:23 AM

I apologize if I hit a sore spot Humus. I realize that Styx didn't come for a lesson, I was hoping the point would be read by his hardware vendor.

IMHO, compilers should be better at optimizing code than the average developer is expected to be. You're right, if this particular developer wants to run on his hardware, he'll have to pack his varyings. My only point was that he shouldn't have to.

Korval

08-29-2005, 11:14 AM

GLSL may be a high level language, but that doesn't mean you can just forget that there's hardware under the hood.That is, in fact, the point of a high level language. Indeed, not having to do nonsense like this was one of the selling points behind integrating a high level compiler into an OpenGL driver.

The compiler ought to be doing this kind of stuff. The reason we agreed (or the ARB agreed. I never did) to sacrifice a bunch of shader compilation/linking performance to put a high level compiler into the driver was very specific: to allow compilers to better optimize the compiled result for their hardware. That was its purpose.

To not do this is a violation of that agreement, and, to my mind, smacks of fraud. We gave up quite a bit for this advantage; if we aren't getting it because ATi is lazy, screw them. Maybe developers will start putting "nVidia only" stickers on their products.

Or, even worse, the inconsistency between glslang implementations mean that developers simply can't afford to ship a product that relies on it, and they abandon the language.

Humus

08-29-2005, 04:52 PM

Originally posted by kingjosh:
IMHO, compilers should be better at optimizing code than the average developer is expected to be.The problem is that the day GLSL was introduced the developer expectations grew by several orders of magnitude in one big step. Compilers keep improving, but it's a very big piece of software and as developer expectations keep growing as well there will still be cases where developer expectations aren't met. I'm not saying it shouldn't have to take care of this case, I'm saying that it's a limitation in the current compiler. No vendor's compiler is optimal in all cases, and none will ever be.

Humus

08-29-2005, 05:05 PM

Originally posted by Korval:
That is, in fact, the point of a high level language.Not if you want to achieve good performance. Do you ignore the hardware when you write C/C++ code? I don't, therefore my code runs fast. 30 years after the language was introduced you still cannot ignore the hardware if you want to achieve good performance. Code close to the hardware will still run much faster. You can still download optimization guides from Intels website telling you everything from how to declare your arrays to how to use intrinsics to get closer to hardware.

Brolingstanz

08-29-2005, 05:54 PM

but isn't the direct analogy more like having to worry about register allocation in compiled c++ code? this is a scenario i think most would agree is preposterous in all but the most critical areas, areas that would probably see some low level, hand crafted asm anyway.

i'd be the first to cut the compiler writers some slack, though. it's no easy feat. and there's no question that coders should keep the hardware in mind, good practices, even if it's only subliminal, as long as it's documented and predictable (as it is for intel).

Korval

08-29-2005, 06:39 PM

Do you ignore the hardware when you write C/C++ code? I don't, therefore my code runs fast. 30 years after the language was introduced you still cannot ignore the hardware if you want to achieve good performance.Yes, I frequently "ignore the hardware". I'm not interested in register counts or structuring my code for best in-order execution. I don't shirk from C++ features that degrade performance, and only occasionally do I even concern myself with overuse of those features. That's why I don't code in assembly. It isn't for cross-platform stuff, it's to get away from needing to care about low-level internals.

Oh, and I've shipped 3 games. On consoles. 2 of them, by design, ran at 60fps.

I expect compilers to be competent. And if they're not, I don't use them. It's one of the reasons that I avoid glslang like the plague: it's just not trustworthy. And it never will be until IHVs start making real compilers for it.

This isn't hard stuff we're talking about here. We're not talking about trying to recognize a sin-approximation in software or something; we're talking about basic compiler optimizations here. I could understand if it were something that was truly difficult or required a week of developer time. But if your compiler writers are at all competent, it shouldn't take more than a day (tops) for one guy to hook this in.

For God's sake, the ARB gave ATi everything they could give in terms of the architecture of glslang. Had 3D Labs had their way, everything would be counted in floats, and these exposed float limitations would be required (ie, if your hardware organizes attributes or uniforms into vec4's rather than floats, you have to pretend that it doesn't). They only asked for 1 instance of this: varyings. Where it matters the most because the hardware limits are so significant.It can't be that hard to count up the number of floats for varyings and assign them as needed.

Basically, I'm just accusing ATi of incompetence. But, then again, that's nothing new; I do it every time they do something stupid :D

Though if nVidia compilers can't do any better, then they too are incompetent...

V-man

08-31-2005, 05:16 AM

No vendor's compiler is optimal in all cases, and none will ever be.

You are talking about GLSL compilers, but I guess you are not aware that the asm compilers can have issues as well. I have written ARB_vp/fp for a pet project a long time ago. It ran beautifully on ATI and then one day, it performed really bad after a driver update. I didn't bother with it cause I had other things to do.

Secondly, I have encountered the case where my ARB_vp/fp was performing better than the GLSL equivalent.
It's dissapointing. Even after multiple driver updates, it happens.

Giving us the ability to hand feed the GPU low level shaders (that the driver won't attempt to screw up) might not be a bad idea.
Having the ability to give compiler flags would be nice.

Humus

09-01-2005, 09:48 PM

Originally posted by bonehead:
but isn't the direct analogy more like having to worry about register allocation in compiled c++ code?In a way yes, but the difference is that a GPU is much more of a fixed platform than the CPU. If the compiler does a poor job on register allocation on the CPU for some piece of code it will still be able to run it, it just means it will resort to system memory more often and thus run slower. Today's GPU doesn't have that ability. If you're out of interpolators, temporaries, samplers, attributes, constants, instructions or any other resource, there's no option other than going to software. The good news is that compilers do get better and hardware get better capabilities as well, so this will be less of a problem in the future.

Originally posted by bonehead:
i'd be the first to cut the compiler writers some slack, though. it's no easy feat. and there's no question that coders should keep the hardware in mind, good practices, even if it's only subliminal, as long as it's documented and predictable (as it is for intel).Absolutely, I agree. I'm not saying you should be thinking like in assembly, but at least having some high level understanding of how the hardware works will help writing fast code.

Zulfiqar Malik

09-02-2005, 04:14 AM

I would have to agree with Humus although i find ATi guilty of writing crappy GL drivers (and sometimes nVidia as well, but they are much better than ATi). GPU compilers are still in their infancy. Consider yourself programming in C/C++ some 15 (or more) years back! You can't expect them to magically produce a compiler that generates optimal code. Since C++ compilers have matured so much over the past many years, we have started taking them for granted. But in case any of you know, the VC compiler still doesn't support 100% C++, AND upto VC 6.0 their template support for so ****ty that one was better off without using them!

As for ATi, they really need to put their act together and come up with better drivers. Its not just a matter of GLSL compiler, their GL drivers in general suck crap!