8.3. Shader Memory Access Ordering

The order in which image or buffer memory is read or written by shaders is
largely undefined. For some shader types (vertex, tessellation evaluation,
and in some cases, fragment), even the number of shader invocations that
may perform loads and stores is undefined.

Fragment shaders will be invoked zero or
more times, as defined in that section.

The relative order of invocations of the same shader type are undefined.
A store issued by a shader when working on primitive B might complete
prior to a store for primitive A, even if primitive A is specified prior
to primitive B. This applies even to fragment shaders; while fragment
shader outputs are always written to the framebuffer
in primitive order, stores
executed by fragment shader invocations are not.

The relative order of invocations of different shader types is largely
undefined.

Note

The above limitations on shader invocation order make some forms of
synchronization between shader invocations within a single set of primitives
unimplementable. For example, having one invocation poll memory written by
another invocation assumes that the other invocation has been launched and
will complete its writes in finite time.

Stores issued to different memory locations within a single shader
invocation may not be visible to other invocations in the order they were
performed. The OpMemoryBarrier instruction can be used to provide
stronger ordering of reads and writes performed by a single invocation.
OpMemoryBarrier guarantees that any memory transactions issued by the
shader invocation prior to the instruction complete prior to the memory
transactions issued after the instruction. Memory barriers are needed for
algorithms that require multiple invocations to access the same memory and
require the operations to be performed in a partially-defined relative
order. For example, if one shader invocation does a series of writes,
followed by an OpMemoryBarrier instruction, followed by another write,
then the results of the series of writes before the barrier become visible to
other shader invocations at a time earlier or equal to when the results of
the final write become visible to those invocations. In practice it means
that another invocation that sees the results of the final write would also
see the previous writes. Without the memory barrier, the final write may be
visible before the previous writes.

The built-in atomic memory transaction instructions can be used to read and
write a given memory address atomically. While built-in atomic functions
issued by multiple shader invocations are executed in undefined order
relative to each other, these functions perform both a read and a write of a
memory address and guarantee that no other memory transaction will write to
the underlying memory between the read and write.

Note

Atomics allow shaders to use shared global addresses for mutual exclusion or
as counters, among other uses.