/*
Intersect a ray with a plane
-plane The plane
-p The origin of the ray
-dir The normalized direction of the ray
-intersection The intersection will be written here
-distance The distance from `intersection to `p will be written here
*/
void rayIntersection( in vec4 plane, in vec3 p, in vec3 dir, out vec3 intersection, out float distance )
{
float dNV = dot( plane.xyz, dir );

//
// We need to determine which of the four ray transits this vertex represents:
// 1) camera and vertex are above the fog plane
// 2) camera is above, the vertex is below
// 3) camera is below and vertex is above
// 4) camera and vertex are below
//

And, an interesting note. I was not calling glDisableClientState after my batches -- putting that in brought up some of my unhappy 6fps scenes to a solid 20. Which is very encouraging.

My graph sorts objects to be drawn by a hash of their required GL state. It's very coarse, but it does mean that all visible foliage patches draw with only one setup and one teardown; but I wasn't disabling GL_TEXTURE_COORD_ARRAY etc.

I think this fixes some problems. I still have massive fillrate hits, and am looking into optimizing the GLSL now.

And, some more good news. I carefully transitioned to VBOs, profiling every step of the way. Frame rate hovers in the mid to high 20s now, even 30 in some places. Plus, shark tells me that the function Foliage::Patch::display() used to take ~45% CPU time, now takes ~7%

I think I've made a lot of progress here. Still need to optimize GLSL. I'm considering various approaches, but so far it's been nothing but micro-optimizations, probably stuff which the GL driver is doing for me anyway and getting me nothing.

Shamyl,
Sorry about not getting back to your earlier, and I'm afraid my analysis is outdated now. ;P Still, take a look at the blue line across the top - the CPU Wait For GPU track. It is the only one that looks suspicious. I put the numbers into Excel and graphed them linearly on their own:

As you can see, they swing wildly, topping out at .6 seconds of stall time. Most of the time though, you're in the 0.1 second zone, which would keep you moored at roughly 10-15 FPS, which is what we're seeing here. These stalls might well have been caused by the non-disabled client state, but it is not inconcievable that they're also rasterization waits â€“ fill-rate hits.

If you want to, please take the Driver Monitor for another spin and monitor only the CPU Wait For GPU value. Ideally, you should be able to push that down a lot lower. Perhaps it already is?

Fenris Wrote:Shamyl,
Sorry about not getting back to your earlier, and I'm afraid my analysis is outdated now. ;P Still, take a look at the blue line across the top - the CPU Wait For GPU track. It is the only one that looks suspicious. I put the numbers into Excel and graphed them linearly on their own:

As you can see, they swing wildly, topping out at .6 seconds of stall time. Most of the time though, you're in the 0.1 second zone, which would keep you moored at roughly 10-15 FPS, which is what we're seeing here. These stalls might well have been caused by the non-disabled client state, but it is not inconcievable that they're also rasterization waits â€“ fill-rate hits.

If you want to, please take the Driver Monitor for another spin and monitor only the CPU Wait For GPU value. Ideally, you should be able to push that down a lot lower. Perhaps it already is?

Fenris, thanks -- I really appreciate it.
Here's the current output of just CPU wiat for GPU:

Shamyl,
The only reason I can see for it to be slow on the foilage texture is that it contains more texels that need "interesting" blending, but why that should drop it so completely is a mystery. What happens if you turn blending off at this point?

The new numbers look saner, but I'm on my iPod at the moment so I can't verify. Could I possibly ask you for a release-built binary to profile? It's an intriguing behaviour,to say the least...

If you're using alpha test, then it is possible the more complex foliage texture is limiting the GPU's ability to compress the depth buffer, increasing the amount of memory traffic required when rendering a large number of widely spaced pixels.

That should be easy to test by replacing the alpha channel in the foilage texture with the one from the letter texture.

1) I wanted this to at least be a little "fun" so I included 4x4 car driving stuff. Drive with the arrow keys. When you inevitably flip the car, hit backspace to right it. If you have a generic USB gamepad, you can steer with the D-pad and accelerate/reverse with the 1st and 2nd buttons respectively.

2) I included two foliage types, the first is your generic grass, billboarded and densely populated. The second is shrubbery which is not billboarded. The latter is disabled, but:

3) There are two toggles at the top of the screen. They let you disable the Foliage & Ferns, use the "Debug" texture atlas, or use the real texture atlas.

You'll notice that with foliage running with the debug texture the performance isn't too bad, but switching to the real foliage texture, performance tanks. Both textures are the same size!

Thanks,

EDIT:I should have mentioned the app requires an Intel Mac with a dedicated GPU. I know it works on the ATI x1600, the Nvidia 9400, 9600

Frogblast Wrote:If you're using alpha test, then it is possible the more complex foliage texture is limiting the GPU's ability to compress the depth buffer, increasing the amount of memory traffic required when rendering a large number of widely spaced pixels.

That should be easy to test by replacing the alpha channel in the foilage texture with the one from the letter texture.

Sounds kind of like I'm screwed if that's the case

That being said, I've come up with some simple GLSL optimizations to implement which may make for some good speedups.

Interesting demo, lots of "how does he do it questions", but first I have a more important dumb user question: how to change the two toggles at the top of the screen? When I mouse over I see a darker bar temporarily under the options, but no amount of mouse clicking or dragging lets me actually change them....

TomorrowPlusX Wrote:Sounds kind of like I'm screwed if that's the case

That being said, I've come up with some simple GLSL optimizations to implement which may make for some good speedups.

More tests to run:

1) How much does performance improve when alpha test is disabled?

2) Can you use occlusion queries to count the total number of your pixels covered by the foilage? (disable alpha test and depth testing when performing this test). How does this compare to the number of pixels in the whole framebuffer?