Textures As Data Structures

I'd like to render hundreds of thousands of objects and provide data for each object in GLSL.

My plan is something like this -
1. Put the data for each object in textures - GL_TEXTURE_RECTANGLE
2. Render to FBO Updates and Logic using texture data
3. Render each Object using glDrawElementsInstanced and use the provided data from the FBO render

I'm looking for general input on how this could be optimized.
How do you determine the max texture size of GL_TEXTURE_RECTANGLE?
Is this the best\fastest way to provide unique data to hundreds of thousands of objects?
Any thoughts on this process?

There are a number of ways to go about this. Is your instance variation data constant or variable based on some GPU processing? Is the number of instances constant or variable based on some GPU processing? If not constant, what is the GPU processing you want to do on those instances?

Will provide more thoughts after you answer these. But briefly the two main routes you might consider for storage are: putting the instance variation data in a texture buffer and pulling it out in the shader with gl_InstanceID, putting it in 1..N vertex attributes and letting the GPU pull it out for you behind-the-scenes.

As to the former option, you could store the instance variation data in a texture rectangle, but then you end up concerned about max texture size because the max size isn't very large. Texture buffers are 1 dimensional and have a maximum length that is much longer than the other texture formats. For instance here, MAX_TEXTURE_BUFFER_SIZE = 134217728 whereas MAX_TEXTURE_SIZE = 16384

The Data will be variable based upon GPU processing. Pass 1 Will calculate the new data. Pass 2 Will render instances based on the data from pass 1.

The number of instances will be constant, but based on the data generated from pass 1 will determine whether it's drawn or not.

I plan on use the first method and using gl_InstanceID to determine the appropriate place in the texture to find the corresponding data. I don't believe I would be able to have 100,000+ vertex attributes.

The Data will be variable based upon GPU processing. Pass 1 Will calculate the new data. Pass 2 Will render instances based on the data from pass 1. The number of instances will be constant, but based on the data generated from pass 1 will determine whether it's drawn or not.

Ok. Just wanted to verify you were doing what I thought you were doing. And you are.

So yeah, non-constant instance variation data in pass 2, generated by a GPU shader in pass 1. Also, while you said "constant" number of instances and implied draw Y/N flag generated by pass 1, what you could do instead is just suppress pass 1 output for draw=N instances so it doesn't eat any per-vert overhead in pass 2 -- that is, have your pass 2 instance count be non-constant and computed by the GPU in pass 1.

So briefly, for your pass 1 what you might look at is using Transform Feedback (TF) to serialize the instance variation data for each instance into one or more output buffers.

Then if you go with a constant number of instances and a single TF output buffer for pass 2, in that pass bind your transform feedback output buffer as input to your shader using a texture buffer (texture wrapper for a buffer object) and launch it using glDrawElementsInstanced().

If OTOH you decide to do culling (i.e. eliminate the draw = N instances from your TF output) in pass 1, and/or serialize the instances into multiple output buffers, then you feed the TF output count(s) from pass 1 into the instance parameter(s) of pass 2. There are at least 3 ways to do that: 1 "slow way" (where you read the count back to the CPU and then feed it to the GPU in a draw call) and at least 2 "fast ways" (where you leave the count on the GPU and use it to populate the number of instances in the instanced draw call "directly").

As to the "fast ways" you may be wondering how that's possible since with glDrawElementsInstanced(), the call parameters are populated from and launched on the CPU side. The answer is using gl*DrawElementsIndirect() to launch your batches. These take the parameters for the draw call from a buffer object on the GPU. The difference between the "fast" methods I mention comes in in that there are at least 2 ways to get the number of primitives generated by transform feedback into the DrawElementsIndirect buffer. ARB_query_buffer_object is one way, but is relatively new. An older way is to use shader atomics written by the TF shader to tally the instance count in the buffer(s).