So.. frame will be in sysmem with one frame delay. With PBO readback call is nonblocking call. But map buggers can be blocking call if there is pending operation related to currently binded. So... if you call map buffer too soon it will be blocking call. If there is no pending operations mapbuffers returns very quickly.

As you can see, right now I'm not even mapping the buffers (eventually I will of course, otherwise this whole excersize would be kind of pointless), and the code in capture still takes about 6 ms. This could still stall I guess if one were rendering at a high enough framerate? However, my rendering is capped at 15 fps so this shouldn't be an issue.

The values for m_width and m_height has not changed since I created the buffers so their sizes are still valid.

/A.B

brinck

10-21-2008, 07:28 AM

Could I for some reason be getting a PBO in system memory? From what I can see in the spec:

http://www.opengl.org/registry/specs/ARB/pixel_buffer_object.txt

there's really nothing preventing this, am I wrong?

For the record, I've tested Song Ho Ahn's Asynchronous Read-back example and there I see a very clear difference in read speed when using PBO. From what I can see I'm not doing anything differently in my code, except that I'm using a lot more GPU memory for other things.

/A.B.

tamlin

10-21-2008, 12:00 PM

Could I for some reason be getting a PBO in system memory?Yes.

From what I can see in the spec: [...] there's really nothing preventing this, am I wrong?No.

yooyo

10-22-2008, 02:45 AM

Try with GL_STATIC_READ. Check your driver control panel.. maybe you have checked some forced AA or such... can you post repro case?

Unfortunately the precompiled binary for Song Ho Ahn's demo uses a screen size of 256 x 256 and waits for vertical refresh, with a refresh rate of 60 Hz this means the transfer rate will cap at 3.7 Mpixels/s regardless of wether PBO are on or off (The figure 3.1 Mpixels/s suggests you're using a refresh rate of 50 Hz, correct?)

You will have to recompile the project yourself and increase the buffer sizes and disable vsync. When doing this you will see a clear difference between using PBO and not using PBO.

I'm using the exact same code in my application and I'm not seeing any improvement over not using PBO, in lack of better theories this leads me to believe that I'm getting a system mem PBOs because there's not enough GPU ram left to allocate the PBOs there.

The demo, "pboPack" does not measure the performance of glReadPixels() alone. It performs 3 things;
1. Read pixels from framebuffer with glReadPixels().
2. Modify the pixels in add().
3. Draw the modified pixels with glDrawPixels().

You will get pure throughput of glReadPixels() + PBO by disabling the step #2 and #3 in my code.

Also, I'd like to mention that pboPack demo does not use PBO for glDrawPixels() because of OpenGL driver bug. Most video cards are failed on glDrawPixels() + PBO except nVidia Quadro when I release this demo. So, I took it out of the code.

The proper usage of glDrawPixels() with PBO is like this. You may get a better result by replacing glDrawPixels() in my code;

I'm pretty certain that there's nothing wrong with my PBO code. So I guess my question is what could make glReadPixels stall (when using PBO that is)? So far the only thing I can think of is that I may be getting a software fallback PBO because I've used up all the GPU ram on other stuff.

glBindBuffer should be instant, glReadPixels too. If glReadPixels
stall then something really wrong there. glMapBuffers can stall if pending glReadPixels is not finished.
If you have frequent glReadPxels calls use several PBO's for that.

brinck

04-29-2009, 06:56 AM

I never managed to get glReadPixels any faster with PBO, the 6 ms that were spent in glReadPixels wasn't a huge problem at the time so I simply left the problem.

Now however, the problem has become more urgent. Since we upgraded to revision 182.08 of nvidias quadro driver the glReadPixels operation takes over 30 ms!

I've tried every combination of usage flag (GL_STREAM_READ etc.) and format (GL_BGRA etc.) but with no difference in speed.

I also tried another approach: instead of using two PBO:s, I used two FBO to which I transfered the framebuffer with glCopyTexImage2D, I then used glReadPixels on the FBO which were not currently being copied to. Unfortunately this was exactly as slow.