The Khronos Group - a non-profit industry consortium to develop, publish and promote open standard, royalty-free media authoring and acceleration standards for desktop and handheld devices, combined with conformance qualification programs for platform and device interoperability.

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

get_group_offset

I have a use case for a get_group_offset function. Suppose you wish to program some reduction algorithm using multiple devices. One option is to use implicit buffer transfers using a subbuffer for each device (i.e., zero-offset method). However, the more classic option is to explicitly manage buffer transfers using a common buffer among all the devices (i.e., non-zero-offset method). This works rather well for the input buffers because there is a global_offset, but the output buffer of a two-step reduction implementation needs some group_offset. Currently there are a couple relatively easy workarounds: pass a compiler option -D GROUP_OFFSET (but potentially requires rebuilding the kernel), or calculate group_offset = (global_offset + global_size) / local_size - num_groups. I propose simplifying things a bit for usability and completeness by including a get_group_offset function.

BTW, I strongly think it was a mistake to use "num_groups" instead of "group_size" as it doesn't follow the naming convention expected from global_{id, size}, local_{id, size}, and group_id. People hate when languages establish patterns and then buck them.