The Khronos Group - a non-profit industry consortium to develop, publish and promote open standard, royalty-free media authoring and acceleration standards for desktop and handheld devices, combined with conformance qualification programs for platform and device interoperability.

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Re: Dynamically creating 2 dimensional local memory arrays

You can either declare the local variable in kernel scope instead of as a kernel argument or you will have to manually index into the array as if it was 2D. This is not any different from how C99 works, is it?

Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

Re: Dynamically creating 2 dimensional local memory arrays

Originally Posted by david.garcia

You can either declare the local variable in kernel scope instead of as a kernel argument or you will have to manually index into the array as if it was 2D. This is not any different from how C99 works, is it?

Those are two work arounds that I have used.

Declaring a local memory array within the scope of the kernel is one way to make a 2D array. But the allocation size then becomes static. I couldn't write code that dynamically sizes the array based on how much local memory is available on the devices compute units. Doing this would be useful if writing code for both nVidia's G80 and Fermi architectures. G80 only has a local memory cache of 16 kb whereas Fermi has a cache of 48 kb. If you're doing matrix dot product operations, it is ideal to have a block size as large as possible, on G80 the max would be 16x16, but on fermi it would be 32x32. If I create the array with in the scope of the kernel, I would have to write two kernels, one for a local mem size of 16x16 and the other for a local mem size of 32x32. Then I'm not even taking into account AMD cards or other architectures.

If I try to manually index into the code, it would be more dynamic, I could specify local memory array sizes outside the kernel so I would only have to write one kernel. Unfortunetly I would have to manually index the array with something like

Code :

As[i * MatrixWidth + j]

instead of

Code :

As[i][j]

This would work, but it takes a few more instructions, which can add up in iterations, it can get confusing, and it makes for much less elagant code.

I'm trying to get the best of both worlds, having dynamic code and having elagant code. Is dynamically allocating 2D arrays from the kernel call impossible in OpenCL 1.0? Or is there some way to do it?