Ow sorry, completely forgot to mention. It's a GPU, Fermi, so it has cache. That could indeed be the problem. But I don't see how a cache could hide the mem underclock almost completely, in the context of path tracing?

jbikker wrote:Ow sorry, completely forgot to mention. It's a GPU, Fermi, so it has cache. That could indeed be the problem. But I don't see how a cache could hide the mem underclock almost completely, in the context of path tracing?

I could see this happening if you have really good locality between parents and their children. Each memory access might yank in a few nodes at a time, and several nodes' worth of compute could be enough to keep that thread running while another one is fetching. Plus, the upper levels of the tree are basically free, since they're accessed all the time, so a miss at the bottom can be offset by another thread starting at the top.

How big's the scene? And do you have a sense of what your occupancy levels are?