Hi all,
I'm relatively new to OpenCL (but experienced with CUDA - porting an application currently, still on nvidia 285 device) and to these forums but I've come across an area not addressed by any of the documentation I have (opencl spec, reference card, reference pages, nvidia ocl programming guide, jumpstart guide etc etc) - or at least I haven't found the 1 line that does apply in the 300 page spec...

In general I'm looking for information regarding running a kernel repeatedly. Ultimately the input data (128MB buffer of raw data) will vary but currently contains zeros - bar a few values to make sure the kernel was reading it properly.
The program flow I'm after is something like:

(In the background to this openCL work a second input host buffer will be populated with data and these alternated in the clEnqueueWriteBuffer instruction)

I don't know of any logical reason why something like the above wouldn't work however in my test example the first loop shows no problems but the second causes segmentation fault immediately after clEnqueueNDRangeKernel - all pointers and memory have been verified correct and unchanging via %p printf's.

Any general information on re-running a kernel like this or specific gotchas that may occur in a similar scenario would be gratefully received.

Thanks in advance.

02-12-2010, 04:52 AM

dbs2

Re: Running a looped kernel

I don't see anything there that should cause a problem. Because you're calling write/read with CL_TRUE they will block so you are guaranteed that everything will be done before you continue (this is really bad for performance, by the way). You also don't need to reset the kernel arguments as long as they are not changing.

Whose OpenCL are you using? It looks like a vendor issue to me.

02-12-2010, 05:17 AM

michaelm

Re: Running a looped kernel

Thanks for your reply, I have no intention of using input data loads with blocking behaviour - just set that to true to guarantee sequential operation to debug the issue. Thanks for pointing out the kernel args - I had a feeling that you only needed to set them once, but again just wanted to make sure.

I've restructured the code such that it performs all the input stages before entering the loop. Then run clEnqueueNDRangeKernel n times. then leave the loop and accumulate the results.

As I had hoped, the correct number of results are obtained after processing and then at 'some point in time' afterwards, the segmentation fault occurs (there is no code executed after this query). Adding some cleanup at the end and running through ddd shows the seg fault point occurring at different places each run - leading me to believe it is a time response to an earlier event, perhaps stack corruption, which is causing the problem.

As for vendor, its Nvidia GTX285 with 190.29 drivers on redhat 5.2 64bit (cuda 2.3 also not that it's of much relevance).