The Khronos Group - a non-profit industry consortium to develop, publish and promote open standard, royalty-free media authoring and acceleration standards for desktop and handheld devices, combined with conformance qualification programs for platform and device interoperability.

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

load from constant global variable to local memory
barrier to wait synchronization of local memory
process data from local
write result in global

I know that loading from global should be much slower and that I am loading the same data over and over in every work item, but the process is done in private which is much faster. In the other hand, I don't know if waiting for the barrier can affect my performance and I also ignore a ratio (roughly) between the read/write speeds of global and local.

manual caching

The second option is what I call "manual caching". So when the accelerator has no cache, the second one runs faster. When the GPU is enough cache, then it won't really matter. Not tested in a while, so not sure anymore: when using the CPU, the second version runs slower.

In most cases I found other reasons to be more important to use local mem.

The second option is what I call "manual caching". So when the accelerator has no cache, the second one runs faster. When the GPU is enough cache, then it won't really matter. Not tested in a while, so not sure anymore: when using the CPU, the second version runs slower.

In most cases I found other reasons to be more important to use local mem.