The Khronos Group - a non-profit industry consortium to develop, publish and promote open standard, royalty-free media authoring and acceleration standards for desktop and handheld devices, combined with conformance qualification programs for platform and device interoperability.

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

clEnqueueReadBuffer blocking always

I don't success on this call to be non-blocking...
In theory as blocking is CL_FALSE it must return immediatly, but I can see as if I increase the value of size, the computed time of clEnqueueReadBuffer is increased too, it's depending on the size.

I have done a lot of tests to check this, with timers, by varying the size, using the last parameter cl_event not null, by printing the result of hostBuffer is always correct even if I start immediatly to copy the hostBuffer to another buffer, in theory in a asynchronous way, I get the result correct because it seems clEnqueueReadBuffer do not return till execution is completelly finished, behaviour expected if I'd use CL_TRUE on blocking, but not the case.

I even reinstalled a couple of different nvidia drivers with same luck...
There is any initialization or configuration of opencl I must set correctly to allow working in a non-blocking way?
what i'm missing?

Re: clEnqueueReadBuffer blocking always

I am having the same issues withe clEnqueueWriteBuffer that even for the non-blocking calls the execution time is increasing with the increase in block size. Although clEnqueueReadBuffer seems to be working fine in my case.

Re: clEnqueueReadBuffer blocking always

In your code you only enqueue commands, at which point do you start issuing them?

Why don't you need clFlush() or clFinish() or clWaitEvents here? I am so confused because the code gives the correct answer...

Hi linyufly.

Ok, maybe I have not exposed the problem correctly, my complain is not about the correctness of the result, but the concurrence, it interrupts my CPU thread code:
Let me explain in detail...

clEnqueueNDRangeKernel() <- this method enqueues a kernel to be executed
clEnqueueReadBuffer() <- this method enqueues a operation of copy transference from GPU to CPU ( device to host )
clFlush() <- this method issues all commands in the queue
clFinish() <- this method issues all commands in the queue AND blocks cpu until they are completed executed ( synchronization point )

So it's important to mention that the execution in the GPU of the enqueued operations will begin when:

a ) clFlush() is called... you are forcing to opencl to start executing the previous enqueued opearationsb ) Some operations have <blocking> parameters like clEnqueueReadBuffer() and if CL_TRUE is passed on that blocking parameter it's like calling clFlush() tooc ) Other operations are implicitelly flushing like clWaitForEvents()d ) clFinish() is called... you are forcing to opencl to start executing the previous enqueued opearations AND it blocks your cpu thread until these executions are completede ) driver decision to flush/finish automatically when some circumstances are met ( enough kernels or operations enqueued, or other internal reasons probably depending on your GPU specifications, number of cores, memory etc ).

Important to mention the next method too:

clGetEventInfo() returns one of the next execution status:

CL_QUEUED (command has been enqueued in the command-queue),
CL_SUBMITTED (enqueued command has been submitted by the host to the device associated with the command-queue),
CL_RUNNING (device is currently executing this command),
CL_COMPLETE (the command has completed)

( NOTE: printEventStatus() it's a method made by myself that consult clGetEventInfo() and print the status for debugging purposes. )

================================================== ==============

Explanation of the code:

I call clEnqueueNDRangeKernel() so it enqueues a kernel
After that I consult the generated event with clGetEventInfo() and it says CL_QUEUED as expected
It means the kernel has been queued, not issued yet nor executed.

Now I call clEnqueueReadBuffer() with blocking parameter to CL_FALSE

According to http://www.khronos.org/registry/cl/sdk/ ... uffer.html
"If blocking_read is CL_TRUE i.e. the read command is blocking, clEnqueueReadBuffer does not return until the buffer data has been read and copied into memory pointed to by ptr.
"If blocking_read is CL_FALSE i.e. the read command is non-blocking, clEnqueueReadBuffer queues a non-blocking read command and returns."

What I expect when I consult the generated event by this method? I expect: CL_QUEUED

But what I find is CL_COMPLETE !!
It means clEnqueueReadBuffer() is ignoring my CL_FALSE.

So briefly:

clEnqueueReadBuffer blocks my CPU thread,

It flushes all previous queue operations
start issuing my kernel in this case and executes it in GPU
executes the readBuffer, and transfer the data from GPU to CPU.
Set the events to the COMPLETE status...
and finally:

returns the control to my CPU thread that has been stopped all this time breaking the concurrence.

That's my complain. As you see I don't even need to call clFlush to issue because of clEnqueueReadBuffer. And according to Khronos documentation it must not happen this way.

In your code you only enqueue commands, at which point do you start issuing them?

Why don't you need clFlush() or clFinish() or clWaitEvents here? I am so confused because the code gives the correct answer...

Hi linyufly.

Ok, maybe I have not exposed the problem correctly, my complain is not about the correctness of the result, but the concurrence, it interrupts my CPU thread code:
Let me explain in detail...

clEnqueueNDRangeKernel() <- this method enqueues a kernel to be executed
clEnqueueReadBuffer() <- this method enqueues a operation of copy transference from GPU to CPU ( device to host )
clFlush() <- this method issues all commands in the queue
clFinish() <- this method issues all commands in the queue AND blocks cpu until they are completed executed ( synchronization point )

So it's important to mention that the execution in the GPU of the enqueued operations will begin when:

a ) clFlush() is called... you are forcing to opencl to start executing the previous enqueued opearationsb ) Some operations have <blocking> parameters like clEnqueueReadBuffer() and if CL_TRUE is passed on that blocking parameter it's like calling clFlush() tooc ) Other operations are implicitelly flushing like clWaitForEvents()d ) clFinish() is called... you are forcing to opencl to start executing the previous enqueued opearations AND it blocks your cpu thread until these executions are completede ) driver decision to flush/finish automatically when some circumstances are met ( enough kernels or operations enqueued, or other internal reasons probably depending on your GPU specifications, number of cores, memory etc ).

Important to mention the next method too:

clGetEventInfo() returns one of the next execution status:

CL_QUEUED (command has been enqueued in the command-queue),
CL_SUBMITTED (enqueued command has been submitted by the host to the device associated with the command-queue),
CL_RUNNING (device is currently executing this command),
CL_COMPLETE (the command has completed)

( NOTE: printEventStatus() it's a method made by myself that consult clGetEventInfo() and print the status for debugging purposes. )

================================================== ==============

Explanation of the code:

I call clEnqueueNDRangeKernel() so it enqueues a kernel
After that I consult the generated event with clGetEventInfo() and it says CL_QUEUED as expected
It means the kernel has been queued, not issued yet nor executed.

Now I call clEnqueueReadBuffer() with blocking parameter to CL_FALSE

According to http://www.khronos.org/registry/cl/sdk/ ... uffer.html
"If blocking_read is CL_TRUE i.e. the read command is blocking, clEnqueueReadBuffer does not return until the buffer data has been read and copied into memory pointed to by ptr.
"If blocking_read is CL_FALSE i.e. the read command is non-blocking, clEnqueueReadBuffer queues a non-blocking read command and returns."

What I expect when I consult the generated event by this method? I expect: CL_QUEUED

But what I find is CL_COMPLETE !!
It means clEnqueueReadBuffer() is ignoring my CL_FALSE.

So briefly:

clEnqueueReadBuffer blocks my CPU thread,

It flushes all previous queue operations
start issuing my kernel in this case and executes it in GPU
executes the readBuffer, and transfer the data from GPU to CPU.
Set the events to the COMPLETE status...
and finally:

returns the control to my CPU thread that has been stopped all this time breaking the concurrence.

That's my complain. As you see I don't even need to call clFlush to issue because of clEnqueueReadBuffer. And according to Khronos documentation it must not happen this way.