A reasonable "fast path" for glReadPixels

I know, glReadPixels and "fast path" go together like cute puppies and boa constrictors.

But here's what I'm doing: I'm rewriting my capture-to-quicktime-movie code to be as efficient as is reasonably possible. I'd like to know what's the best format and type, whether ( for example ) I can grab GL_RGB and not worry about format swizzling, and so on.

The most important thing is to get it working asynchronously. Theoretically you can do that with PBOs, though you might have better luck with the old GetTexImage2D path. Details for using PBOs in this way are in the extension spec, details on the GetTexImage2D path (Mac-only) are in the mac-opengl list archives.

Either way, you'll need to use a format that matches the read source (framebuffer or texture) exactly. That probably means BGRA, UNSIGNED_INT_8_8_8_8(_REV) as for texture uploads.

It suddenly occurs to me that even with PBOs, CopyTexSubImage2D + GetTexImage2D might still be better than ReadPixels at ensuring asynchronicity.

My plan was to have the main thread use glReadPixels to put the current frame into a buffer, and have a separate thread append that buffer to a flat file in /tmp while the main thread went on to render the next frame.

So, what it *sounds* like to me is that I should make a shared context, and use glCopyTexSubImage to copy the framebuffer to a texture in that context, then from a separate thread use glGetTexImage2D to read those pixels out. Am I correct?

Where do PBOs come in? Or are they there just to make a separate shared context?

I'd imagine one way is to buffer the textures by calling glCopyTexImage2D and saving them in a new texture unit that you put on top of a queue. In another thread (with a shared context), you can take those texture units off of the queue, then use glGetTexImage to them off and delete that texture unit. That way if reading falls behind, they will build up without slowing your framerate. Of course, if it's too far behind, it can end up building up a massive quantity of data and cause thrashing. Of course, as long as you're not saving every frame, it may stay more or less in synch.

The former is cross-platform; the latter is Mac-specific. You can combine the two by using CopyTexSubImage2D and GetTexImage2D with PBOs, as I said, which is probably better than the former since you won't prevent rendering as you read back.

Either way, make sure you double-buffer the object you're reading back to.

The whole point of PBOs/texture range + GetTexImage is that they are asynchronous, so you don't need a second thread. Conversely, if you have a second thread, then you don't *need* PBOs or texture range, though they'll probably still help.

Good to hear I'm misinformed. I recall some ruckus a few months back that the "multithreaded gl" updates were only for intel macs. But, anyway, glad to hear I'm wrong.

My plan for multithreading was to have a separate thread serialize the grabbed buffer data to disk while the main thread carries on rendering the next frame.

I'll try the simplest approach first -- using the main thread with the technique you've described to pull out the texture data, and the worker thread to serialize it. I'd prefer to stay away from hairier stuff like multithreaded gl access if I can.

Thanks for the tips. I'll probably have more questions soon, of course

I think you're still mistaken. Multithreaded OpenGL means that the meat of the OpenGL calls run on a different thread than you make the calls from. You have always been able to make calls to OpenGL from different threads, however, as long as the threads are synchronized. Be careful, however: you need to attach the context to every thread you use it in. (aka: if you attach the context to the main thread, it's not available in other threads until you attach it there) That also means that you can attach different contexts to different threads without having to constantly attach them back and forth, since they stay attached in their respective threads.

So, let me think out loud. The app thread -- the "main" rendering thread has the primary display context ( plus another context for fullscreen ). I have a separate ( but shared ) context which is "attached" to my serialization-to-disk thread.

1) The main thread uses glCopyTexSubImage to copy the screen to a texture.

2) The writing thread wakes up ( insert handwaving about threading here ) and uses glGetTextImage to read that data into ram, and writes it to disk -- meanwhile the main rendering thread is rendering the next frame.

Yes, that's pretty much the way I've envisioned it. You can also buffer the textures with the queue like I suggested, which lets you make the reading back asynchronous. I would only capture a maximum fps, too. (like 30 fps, so if it's running at 60 fps, you're only capturing every other frame)