The Khronos Group - a non-profit industry consortium to develop, publish and promote open standard, royalty-free media authoring and acceleration standards for desktop and handheld devices, combined with conformance qualification programs for platform and device interoperability.

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

how to implement serial calculation in kernel code?

The following piece of code is part of my kernel code for my calculation, because other part code are quite independent parallel that can be executed on each work item (no data synchronization needed), but this part looks like a serial one (the i th output needs the output the i-1 th updated value), so I think that I can make one work item do it, and other work item just do nothing when it comes to this step. So i wrote this , supposing I use work item 0 to finish the computation

//tid is the thread local id, tB and m are all pointer to local memory
//basically I need to derive array m from array tB, one element of m is derived on each step of the first loop. The value of m Is correct when I execute the kernel on CPU, but wrong on GPU. Is it because the synchronizing goes wrong on gpu? Or do you have suggestions to make it work right on gpu? Thank you so much!

Re: how to implement serial calculation in kernel code?

Re: how to implement serial calculation in kernel code?

Originally Posted by deNorma

then why not just use cpu to do the work?

hi, thanks for your reply. but i have two pieces of such code in my kernel, if i do it on the cpu, then i would need to break the kernel into 3 kernel codes? and pass the value back and forth betwee the cpu and gpu five times. that doesn't sound efficient. do you know whether it is eligible to make one thread do this work, or other way to write this part of code? what is strange is that code i wrote like this output correct result when running on cpu, but wrong on gpu, i don't understand why.....

Re: how to implement serial calculation in kernel code?

when you do this serial calculation, does every one has to wait the serial result to proceed?

of course you can use one thread to calculate. and if the result is different, it just means your gpu code is not correct.

yes, the following steps in each thread need to wait for the serial result to proceed. does putting barrier(CLK_GLOBAL_MEM_FENCE); before and after this piece of code enough to synchronize all other threads with the this thread?

Re: how to implement serial calculation in kernel code?

If you're only using one work-group you will get only a tiny (1/4 to 1/48th) of the total GPU performance.

If you need to do this sort of synchronization across all work-items you have to wait for the kernel to finish. If the cost of doing the data transfer to the CPU is too high to do it that way, then you have two options:
1) wait for the first kernel to finish and then run a second kernel which just does the serial part using a global size of 1
or
2) figure out another algorithm.