If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Hybrid View

Regarding memory sections in openCL

Hi,
I just wanted to clarify below points related to memory sections in openCL. Please let me know if i am correct or not.
1) The variables defined inside the kernel are stored in Private memory. Any variable defined other than __local,
2) if i declare constant variable inside kernel, then it will be stored in private memory
3) Constant variables are stored in the global memory region
4) what is the necessity of sending a local variable as kernel parameter as it can be done inside kernel also

1) This is the default behavior when there is no address space qualifier.
2) It will be stored in the global memory (or optimized away by the compiler).
3) Yes.
4) This is useful when the size of local buffer is variable and set by the host.

Thank you very much for your reply. I wanted one more clarification regarding local memory usage.
1) Inside the kernel, we have to load the local memory from global memory for each work item and cannot copy in a single shot(like memcpy for all the work items)
2) I used local memory but couldn't see any improvement in performance. In my case i will reading all the pixels only once. Is it due to this ? as it will be helpfull only from the second round of access when it will be moved to cache during our first access ?

1) You can, but this would be poor coding. Local memory acts as a user-managed cache for the work-items of a work-group. If you want to have the global memory to local memory copy made by one work-item, you'll have to do it in a conditional section such as 'if get_local_id(0)==0', which will cause an execution flow divergence.
2) It depends on your algorithm. Using local memory is interesting if you can amortize the cost of the global-to-local copy.
In the classical case of matrix multiplication, the size of data is O(n^2) while the number of operations is O(n^3). Caching data is then meaningful since you can expect an O(n) improvement (n being the width of a block)
In the case of a vector-to-vector addition for example, it would be pointless to cache data into local memory because data is accessed only once and there is no 'reuse' of the cached data.