I've got to write a GLSL program that morphs the geometry of a surface in real time, calculating the vertex normals and the lighting in either the vertex or fragment shader. I've got a vertex shader that does all of this (it moves the z-coordinate of the vertex, calculates the surrounding face normals, uses the face normals to calculate the vertex normal, and then calculates the lighting from a single light source). This runs fine on Nvidia NV40 and ATI R400 hardware (e.g. GeForce 660, ATI x700), but I need to support an R300 ATI card that only allows 1024 vertex instructions and 160 fragment shader instructions (e.g. ATI's "SmartShader 2.0"). (Surprisingly, the card I need to support is an $800+ FireGL V7100 -- which is frustrating because my code runs fine on a $200 x700!!!)

First of all, is there a tool available that can tell you the "length" of your shader in terms of ALU instructions? I've played around with ATI's "Ashli Viewer" ( http://www.ati.com/developer/ashli.html ), which has this metric, but I don't quite grok how this tool can be of use to me.

Second, is there a more efficient way to calculate the vertex normal? The surface "morphing" only occurs along the z-axis, and surrounding vertices are known. I guess I'm wondering if there are any simplifying assumptions that I can make?

Third, is this problem a good candidate for a multi-pass algorithm? I was thinking that a first pass could perform the morphing, the second pass could calculate the normal and, perhaps, perform the lighting (in the fragment shader). I've never written a multi-pass algorithm before, so I'm not sure what the best approach is. Specifically, on the first pass should I write the results to the frame buffer, or is there a better way to get the results back out of the GPU? I'm a little naive in this area...

So what do you guys think? Am I screwed getting all of this on the FireGL, or is there hope?

V-man

09-26-2005, 03:56 AM

R300 can do 256 native instructions in vs and 96 in fs.
I don't know what the status of the FireGLs are. They have different drivers.

If you do multipass, create a float point RTT. In the second pass, bind it as a texture. The R300 support it.