Need Help or Have an Issue?

Multiple Kernels

This guide was created for versions: v0.3.1 - Latest

As we've seen before, SYCL kernels are launched asynchronously. To retrieve
the results of computation, we must either run the destructor of the buffer
that manages the data or create a host accessor. A question comes up - what if
we want to execute multiple kernels over the same data, one after another?
Surely we must then manually synchronise the accesses? Luckily, we barely have
to do anything. The SYCL runtime will guarantee that dependencies are met and
that kernels which depend on other's results will not launch until the ones
they depend on are finished.

All of this is managed under the hood and controlled through buffers and
accessors. It is deterministic enough for us to be able to know exactly what
will happen. Let's see an example:

As we can see, some buffers are reused between the kernels with different
access modes, while others are used independently. The order in which the SYCL
runtime schedules the kernels will mirror this usage.

The first two kernels will be scheduled concurrently, because they do not
depend on each other. Both of them read from the same buffer (A), but they
do not write to it. Since concurrent reading is not a data race, that part is
independent. Then, they also write to different buffers, so writes do not
conflict. The runtime is aware of all this and will exploit it for maximum
parallelism.

The third kernel is not independent - it reads from the buffers B and C
into which the first two kernels write. Hence, it will wait for them to finish
and be scheduled immediately after that.

Finally, the fourth kernel does not read anything that a previous kernel
wrote, but it does write to the same data - the D buffer. Since mutating
shared state in parallel is a data race, this kernel has to wait for the third
one to finish and will execute only then.

Our program outputs the correct results:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

In this case we have a well-defined execution order, since all kernels are
submitted from the same thread. What if we have a multithreaded application,
with submit calls being made on several threads? The queue is thread-safe,
and the order in which kernels are executed will be decided by the order of
submission. If you want to guarantee a specific order between kernels
submitted from different threads, you have to synchronise this manually and
make submit calls in the right order - otherwise it could be random,
depending on which thread happens to execute its operation on the queue first.