The Khronos Group - a non-profit industry consortium to develop, publish and promote open standard, royalty-free media authoring and acceleration standards for desktop and handheld devices, combined with conformance qualification programs for platform and device interoperability.

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Every first thread of each workgroup tries increasing (or decreasing, based on the direction variable) synch and checks if the value reached the total # of workgroups and exits if so, and waits otherwise.

I am using GTX570 card which has 15 SMs and this code works if my number of workgroups, or WORKGROUP_COUNT, is 15 or less.

The problem, however, is that it doesn't seem to get out of the function (for at least some WGs) if the number of workgroups is set to 16 or higher. Anyone has any idea how this might happen?

My initial guess is that one WG is starved by its rival WG on the SM and doesn't get into the function but I'm pretty sure there is more to it!

I'm sure you're aware that the rule for barriers is that every work item must hit them. If one of your work items is stuck in a loop, it will never hit the barrier and the rest of the work items will park there waiting.

This cannot work.
Threads on a GPU are not logical threads sharing computation time thanks to a multitasking mechanism. They are physical threads running on processing elements whose number is limited.
Work-items are run concurrently in batches on the processing elements, but these batches are run sequentially.
To say it simply: you have not guarantee that all the work-items of all the work-groups run concurrently.
As a result, your work-items are blocked when their number reaches a given threshold because the other work-items have not even started and won't.