The Khronos Group - a non-profit industry consortium to develop, publish and promote open standard, royalty-free media authoring and acceleration standards for desktop and handheld devices, combined with conformance qualification programs for platform and device interoperability.

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

CL/GL Interop, OSX -- ever shared a Renderbuffer or Texture?

Problem: Attempting to share a Renderbuffer (or a Texture) fails when clSetKernelArg() is called.

I've spent days on this and am finally asking for help.

My program generates frames for a video projector that runs at 60fps (16.7ms frames).

My kernel runs in (typically) 24ms, but it's taking 50ms between each frame. I assume that some of the extra cost is because I'm using the GPU to calculate the pixels, then enqueuing a readbuffer to pull the data *off* the GPU, then using glDrawPixels to put it *back onto* the GPU for display. Perfect situation to try OpenGL/OpenCL interoperation, right?, to avoid the two extra copy operations.

There are many examples, and I *have* succeeded in sharing a VBO with OpenCL, and can write to it, but that doesn't help me. I don't want to write vertex data, just the 2-D image that's been calculated.

There are examples of two different ways to do this, and they both involve Framebuffer objects.

You can attach a Renderbuffer to a Framebuffer, or you can attach a Texture to a Framebuffer.

Then you should be able to write to that buffer in opencl and display it with opengl, no extra copies.

I have found a few examples of this in code, and I think I'm doing everything exactly the way the examples say to do it, but maybe it is broken in OSX? .. because it doesn't work. The FBO is "Complete", no errors along the way, until I try to do the clSetKernelArg. That call returns error -38, CL_INVALID_MEM_OBJECT.

*note: I would rather use a Renderbuffer than a Texture, since all I'm doing is making a 2-D RGB image that I want to display. But I tried a Texture out of desperation. Still no help.

I have tried so many variations ... and when I look at examples of code telling me "this is how you do this", I don't see anything wrong. I think that if there were an error, something along the way would be clobbered -- but all my gl and cl objects check out. It's only when clSetKernelArg is called that it finally gives an error. Pretty frustrating!

Re: CL/GL Interop, OSX -- ever shared a Renderbuffer or Text

Nope; those are in there, but thanks for asking! I was just cherry-picking the lines of code that I thought might have an obvious bad/mismatched parameter. Those two are called, then a clEnqueueWriteBuffer for the kernel's input variables (delivered in a structure), then a clSetKernelArg for that input structure (which works), then the clSetKernelArg for the output buffer, which was created from the gl buffer object with no problems reported; and this clSetKernelArg is what fails. clbo is a static cl_mem, just like the buffer object that's used when interop is not on. The difference being that it was created using clCreateFromGLRenderbuffer instead of clCreateBuffer, and it's in a context created in association with the gl sharegroup.

I'm a bit puzzled that it fails at that point; I didn't think anything was checked in the passed-in address of the cl_mem object. If anything were wrong with that I was expecting an error later, while the kernel was executing; not when I first hand the pointer up to it. I mean, the kernel hasn't looked at it yet; there can't be a size check because clSetKernelArg doesn't know anything about size; that's in the kernel, which hasn't had the opportunity to raise its head yet in this scenario. It should just be an empty output buffer as far as cl is concerned; all it is is an empty place to store a stream of bytes....

There was one more detail I was going to add but I don't remember it right now and it was only possibly marginally relevant. Ok, perhaps it was this: I've used the same mechanics to successfully attach a glBuffer object (vbo), and I write my image data into it (which is nonsensical to a vbo), and call display stuff, which is useless but it doesn't crash, and that runs continuously until I interrupt it. So, again, kinda puzzled as to why this fails, particularly at that point.

There are downloadable example projects for xcode, in (i.e.) grass / oceanwave, but those all use vertex buffers. All I wanna do is paint the plain old, vanilla pixels, that are already calculated, on the screen. Perhaps the Universe is playing with me?... O_O

Re: CL/GL Interop, OSX -- ever shared a Renderbuffer or Text

clSetKernelArg lists a few possible error return codes - some of which require checking the arguments. It's just a pointer, and it has the kernel invocation conventions available so it could do plenty of checking if it wanted to.

Only thing I can think of is an incompatible image format/renderbuffer setup - but clCreateFrom ... should catch that.

I suppose try posting a complete example and see if anyone can help ...

Re: CL/GL Interop, OSX -- ever shared a Renderbuffer or Text

Thanks for the pointers, folks.

I did manage to get the cl compiler stuff redirected to the system console, ajs2, but at first that just gave me the text equivalent of error -38, "CL_INVALID_MEM_OBJECT", so I didn't think I was much further than before.

But you were right, notzed, clSetKernelArg does do some checking of the pointer against the kernel arg type. I resisted for so long rewriting the kernel to use an image2d_t for the output buffer arg, 1) partly because I was so insistent on just writing bytes into a pointer without using the image accessing functions, plus 2) that meant having two .cl files, one for interop and one for non, and two cached binaries, doubling some variables etc., and I wouldn't know if that work would even be worthwhile until it was done. But ultimately I did it.

And once that was done, there was more detailed information in the system log. Not just the invalid mem obj message, but a line before that like "Kernel argument 2 should not be write-only, but object &0xnnnnnnnn is write-only". This told me that it needed to be write_only in the kernel of course, which was easy enough, but more importantly that it *was* worth doing that work and that it did check the kernel args, but just didn't have a super useful error message to offer earlier.

Now there still isn't anything on the screen(!), which is frustrating, but the kernel now does a write_imageui() for each pixel, then there's a gl framebuffer blit that seems to execute, and the inter-frame interval is much better than non-interop with its two extra pci bus transfers. Next, just to get it to actually put the pixels where they can be seen ... maybe worth another post later, or add to this one.

Re: CL/GL Interop, OSX -- ever shared a Renderbuffer or Text

Actually images are very good for image data - primarily as they provide automatic conversion on read/writes and interpolation on reads, and work well with a 2d access patten.

And unless you have a specific algorithmic requirement for integers, you'll probably find doing everything in floats will be easier, and run faster. That is what the hardware has been optimised for. The same code will then also work with different storage formats (from normalised unsigned 8 bit, 16-bit, or float).

BTW if you're using UNORM_INT8 you need to use write_imagef, not write_imageui.

Re: CL/GL Interop, OSX -- ever shared a Renderbuffer or Text

BTW if you're using UNORM_INT8 you need to use write_imagef, not write_imageui.

Ding! Yes, I did stumble across that fact via extensive searching just a few hours before you sent your message, and then at last there was "something on the screen"! From that point on it was much less frustrating, as the effects of any changes were visible, and now it's working beautifully. That was the opposite of my first reading of the functions, however; I'd thought you'd use "ui" to write unsigned integers, "f" to write floats, etc. SO, a critically useful answer that I just happened to stumble upon earlier, and I'm as grateful as if I'd heard it from you first.

On the rest, yes it's float4 all the way up until the very end. I really like to put every bit just where I want it, and am not super comfortable not being able to explicitly build each byte for display, but it works! Also, I had no plans to do any scaling, conversion etc. -- but, a blit from a smaller size to full screen happens to be good for demos on the laptop (with some GL_LINEAR interp), though not for showtime on the big machine, where every pixel shall be explicitly calculated. Very cool! Works with either a Texture or a Renderbuffer.

Now on to the next problem, which I'll probably post on the AMD board. What's just been described works on nVidia on the MBP as well as the (weak) nVidia on the Mac Pro. However, on the big-dog five-hundred-dollar AMD 5870, it fails on create context, waaay before any of this fiddly stuff -- "cannot find device 0xnnnn in context 0xnnnn". If there's no obvious answer to that, I'll just shell out for an nVidia 570 or 680; they supposedly have better oCL throughput anyway.

Re: CL/GL Interop, OSX -- ever shared a Renderbuffer or Text

Originally Posted by Photovore

BTW if you're using UNORM_INT8 you need to use write_imagef, not write_imageui.

Ding! Yes, I did stumble across that fact via extensive searching just a few hours before you sent your message, and then at last there was "something on the screen"! From that point on it was much less frustrating, as the effects of any changes were visible, and now it's working beautifully. That was the opposite of my first reading of the functions, however; I'd thought you'd use "ui" to write unsigned integers, "f" to write floats, etc. SO, a critically useful answer that I just happened to stumble upon earlier, and I'm as grateful as if I'd heard it from you first.

My first attempt at images - back when the drivers barely supported them - i did the same thing and wondered what was going on, it left such a bad taste i didn't touch images again for months.

And that whole 'nothing @##$@ works' thing is a massive barrier to getting started - very frustrating.

Now on to the next problem, which I'll probably post on the AMD board. What's just been described works on nVidia on the MBP as well as the (weak) nVidia on the Mac Pro. However, on the big-dog five-hundred-dollar AMD 5870, it fails on create context, waaay before any of this fiddly stuff -- "cannot find device 0xnnnn in context 0xnnnn". If there's no obvious answer to that, I'll just shell out for an nVidia 570 or 680; they supposedly have better oCL throughput anyway.

Well hopefully the amd forum can help - it's something that does work so it might be a install or driver issue, or bug.

I'm not sure where you heard such a thing but from every benchmark i've seen the 680 is pretty poor for OpenCL (sometimes very poor) - it looks like NV are now targetting a different market as 'gpgpu' hasn't really taken off as a selling point for mass market cards: i.e. games, with better power efficiency. And with those architectural changes the CUDA/OpenCL performance dropped off significantly - it's less than some of their older cards and miles behind the GCN stuff except on very specific workloads. That's not even counting the fact that OpenCL always seemed to be a dirty word around nvidia.