Share this post

Link to post

Share on other sites

The answer is covered under the async-compute sections.
It's hyperthreading for GPU's - when you've got a thousand cores, you want them to always try and find something useful to do.
e.g. Some draw calls are rasterization-bound, leaving compute cores idle, some are texture-fetch bound, also leaving compute cores idle. More queued up works gives those cores more chance of finding work to keep themselves busy with.

Share on other sites

In D3D11 & GL there is only one visible queue API wise and the driver must take care of that and does a poor job because it lacks a lot of information to accurately deduce dependencies while ensuring it always looks correct.

In D3D12/Vulkan the developer must take care of it by inserting fences and barriers and explicitly handling the multiple queues and having the queues wait for the other queues.

Edited September 21, 2015 by Matias Goldberg

1

Share this post

Link to post

Share on other sites

Does XB1 really support multiple default/graphics command queues or just one graphics queue + compute queues in concurrency like all GCN on PCs? I was aware that XB1 comes with a GCN 1.1/Gen 2 Bonaire (aka 7970/R9 260 series)..

I guess this is a kind of info we can find in the XDK (that I do not have .-. )

Share this post

Link to post

Share on other sites

I don't believe it has two render queues. What It has are two asynchronous compute engines (ACE) each of which manages several GPU tasks, switching amongst tasks as the current one stalls (e.g. Awaiting memory).

In a CPU, hyperthreading is a "logical thread" (all CPU state, such as registers) which utilizes the functional units that are unused by its companion thread(s) to make forward progress (e.g. When instruction sequence or data dependencies prevent a single thread from saturating the core's capacity to issue instructions each cycle.) An Ace is like that, except that it utilizes unused compute units (shader lanes).

I could be wrong about Xbox one not having multiple render queues, but no PC hardware yet has it. I suspect, though, that we will see two true render queues soon -- as it could help reduce latency in VR applications, which are very latency-sensitive.

1

Share this post

Link to post

Share on other sites

An ACE is a higher level manager which will split work to compute units; internally the CU can schedule and control up to 40 wave fronts of work (4 x 10 'program counters' if you will) dispatching instructions and switching between work as required - the details are covered in AMD presentations, but basically from each group of 10 program counters it can dispatch up to 4 instructions to the SIMD, scalar, vector memory and scalar memory and program flow control units, which is the 'hyper threading' part.

(Each CU can handle 40 programs of work, each of those consists of 64 threads, multiple up by CU count and you get the amount of 'in flight' work the GPU can handle).

The ACE, which is feeding the CU, handles work generation and dispatch, along with work dependency tracking - from a CPU point of view it is more like the kernel secular, working out what needs to be dispatched to each core (although instead of just assigning work its more like a case of "I need these resources, can anyone handle it?" for the work, with the ability to suspend work (and, iirc, pull the state back) when more important work is required to be run on a CU).

The amount of ACEs varies across hardware; at least 2, currently a max of 8.

Share this post

Link to post

Share on other sites

it can dispatch up to 4 instructions to the SIMD, scalar, vector memory and scalar memory and program flow control units, which is the 'hyper threading' part.

When getting dirty in hardware details, that is indeed the closest analogue to actual hyperthreading
But, I also think of the whole multi-engine/multi-queue high level system as being a hyperthreading analogue, because, if you forget about intra-task parallelism (i.e that Draw/Dispatch tasks are made up of thousands of pixels) each queue is just a linear sequence of Draw/Dispatch instructions. Multi-queue suddenly means that you've got 2+ sequences of instructions to pull work from. This is akin from going from having a single hardware thread to having 2+ hardware threads.