In my application(GLSL Compute Shader), glGetTexImage() is used to load an image buffer back to CPU memory. The texture size is 2K*2K*RGBA Byte, and my graphic card is GTX970.

However, the nvidia nsight reports that glGetTexImage() costs about 30.46ms. This can not be accepted by my application where I expect less than 2ms.

So, how could I improve the performace? Thanks!

Dark Photon

10-26-2015, 06:35 AM

Have you done any estimates on what doing that readback requires? Have you verified that that those requirements are met?

2K*2K*4Bpp = 16MB. You want that pulled down from a discrete GPU over the PCIe bus in 0.002 seconds. If it's not ready, you need more. In fact, depending on how much time it takes your GPU kernel to complete, you may completely overrun your 2ms budget in waiting for your kernel to finish. There are some techniques you can use to hide this latency if you don't absolutely need the latest image but possibly one rendered the last frame. But let's go with the assumption that you've already waited for your kernel to complete and you're ready to do the readback right now.

If the image was guaranteed to be ready on the GPU side, you'd need just under 8 GB/sec throughput in practice (not theoretical). 2k*2k*4 / 0.002.

Beyond basic hardware capability issues, IIRC NVidia arbitrarily limits readback performance on their GeForce products, at least with ReadPixels. There was a trick posted a while back to get around this limitation in the driver. Maybe someone can help me out here with what it was. But IIRC instead of reading back directly or through one PBO, you readback to a PBO, then do a PBO-to-PBO copy, then readback from the 2nd PBO. Something like that. Adding this extra copy yielded an incredible reduction in the total readback time, which tells you that some internal speed governor in the driver has been bypassed.