GLSL fragment program limits on GMA950 (MacBook)

I just wanted try out some GLSL magic on my MacBook and I was super disappointed after the third texture fetch. It seems that the GMA950 supports only 3 texture reads in fragment shader. Or at least in my case it reverts to software rendering (or some something else that is 2fps) if I try to read the texture the fourth time. Does someone know more about the GMA950 fragment program specs?

I tried to browse some docs at Intels site but I could not get further than their volumetric PR BS.

The vertex shader is just transforming the vertex and passing the texcoord through. If I remove the last lookup (or remove the usage of it in the last line) it'll work just fine. That does not look to indirect texture access to me, I'm just sampling the same tex mutiple times.

Tried binding the same texture to four different texture units and using four different uniforms? Perhaps there's some limit on the number of times you can sample from a single texture or something.

It could also be that you're running out of temporaries, I suppose. I don't think the register allocator in Apple's GLSL implementation does a very good job of reuse... can you write an ARB_fragment_program that's equivalent and runs in hardware?

I tried binding the same texture for 4 units, duplicate the texcoords in vertex program and then just access and add up the texture values in the fragment shader. Same problem. I think I'll test next how many textures I can use using the fixed function pipeline.

This finding correlates what I see happening in the GLSLShowPiece. The shaders that access more textures (like life) are superslow where as the more procedural shaders run just fine. I tried adding one more texture read to the Earth sample and it went slow too. It looks like they supported just enough to enable diffuse+bump+shadow maps. Damn Intel, damn Apple!

You're already using 8 temporaries directly. it's also going to need to use a temporary behind the scenes for each of your subtractions, and 3 for your 4-way average. (one for each addition) That's a total of 15, so it's easy to see where it may not re-use temporaries efficiently enough. If you do this in assembly shaders, then you could probably find a way to have it fit into enough temporaries to work in hardware.

I took the liberty of writing a fragment program in shader assembly to use the fewest amount of resources possible. I got it down to 2 temporaries, 4 parameters, 8 ALU instructions, and 4 texture lookups. If this won't work in hardware, then there's no hope.

I tried the above code in Shader Builder using the Basics.shdr as basis and it indeed runs a slide show to me too. If I remove the last tex access it's fine again.

For some reason I could not get the second texture to work with texture environment, but I suspect my texenv skillz are a bit rusty.

I knew the HW was bad when I got this machine, but I don't really respect the fact that they are lying in the drivers caps that the HW is capable of doing more than it actually can. Actually... it really pissed me off.

I think I'll close this thread by the famous words of Mark Rein: "Intel is killing PC gaming"

Thanks for filing it and the update! But it makes me feel stupid for not filing it in last December when I apparently was finding this same problem when making motion blur effects in Unity. Hopefully I'll learn

First, to understand what's going here you need to read the ARB_fragment_program spec, in particular
Issue (24) What is a texture indirection, and how is it counted?

Now that you know what an indirection is, you can see how to stay within the limit-- calculate texcoords in a batch, then do a bunch of samples in a batch. Any time you use a temporary variable (as opposed to a varying) as a texture coordinate, you are entering another indirection phase, and GMA 950 (and ATI cards) only support four phases.

So you could get more than 3 lookups on a GMA 950 in Tiger if you used ARB_fragment_program, carefully structuring your program into indirection stages. But, it turns out that with GLSL ARB_fragment_shader, there was a bug which resulted in texture samples being counted as indirections when they shouldn't be.

The good news is that this is fixed in Leopard. Now on GMA 950 you can write a shader like this:

Code:

// 1D motion blur using 31-sample box filter

// this shader is carefully constructed to run on limited hardware.
// it will just barely fit on GMA 950-- 62/64 ALU, 31/32 TEX, 4/4 indirections.

Note that you still have to be careful about how you write your shader-- follow the same guidelines for ARB_fragment_program to group texture lookups into indirection groups. Also, as noted in the spec, some additional operations such as swizzling can cause an indirection because they implicitly use a temporary variable.