DX12 support has just been added to 3DMark showing unbelievable results with DX12 delivering up to 20 times faster performance than DX11. AMD Radeon graphics cards are showing the most significant gains compared to their Nvidia GeForce GTX counterparts.

Testing has shown in fact that with AMD’s most recent driver update the R9 290X is not only on par with the $999 12GB GeForce GTX Titan X, but also maintains a minute edge against the Nvidia flagship. When compared against Nvidia’s $549 GTX 980 the R9 290X from AMD delivered a 33% performance lead.

PCWorld’s results show the difference in API draw call handling between both the Nvidia GeForce GTX Titan X and the AMD Radeon R9 290X. With DX11 the Titan X manages a maximum of 740 thousand drawcalls per second and a whopping 13 million, 419 thousand calls with DX12. In comparison the 290X manages a maximum of 935 thousand drawcalls with DX11, so more than the Titan X to begin with and again an unbelievable 13 million, 474 thousand calls with DX12. So the 290X slightly edges out the Titan X in DX12 drawcall handling.Interestingly enough AMD’s latest drivers have improved DX12 performance to the point where it’s actually a head of Mantle by about 8%.

PCPer’s testing is more traditional, showing the frames per second instead of the number of drawcalls per second. The GTX 980 delivers 15.67 FPS in DX12, a massive improvement compared to the 2.75 FPS it manages in DX11. On the other hand AMD’s R9 290X delivers 19.12 frames per second in DX12 a significant jump over the GTX 980. And perhaps what’s more interesting is that the next iteration of 3DMark which will debut with Windows 10 also supports Mantle. And the 290X delivers 20.88 FPS in the same benchmark running Mantle. A 33% performance lead over the GTX 980 running DX11 and a 9% lead over the same 290X running DX11.

Last but not least, let’s take a quick look at the performance scaling in relation to the number of CPU cores. DX12 shows remarkable scaling with the addition of more CPU cores but hits a wall at 6 cores. On the other hand Mantle will continue to scale beyond 6 cores, showing extended performance scaling over DX12 with eight cores. I should point out an interesting phenomenon that occurs when hyperthreading is enabled with eight cores for a total of 16 threads. it seems that neither the GTX 980 from Nvidia or the R9 290X from AMD will make use of Intel’s hyperthreading technology. In fact performance shows slight degradation with HT on vs off with the R9 290X.

I should remind everyone that these are synthetic benchmarks which should not be taken as the be-all end-all performance metric. Also please keep in mind that the race for faster, more efficient DX12 drivers is still in its infancy and very much represents a leapfrog pattern at this point. Last time we reported on the state of DX12 drivers and performance Nvidia was ahead. Today we see the roles being reversed with AMD taking the lead. However I’m more interested in testing out actual DX12 games when they come out. Which is rumored to be around the end of the year. In the meantime stay tuned as we may have our own DX12 3DMark results to present soon !

UPDATE :
I thought I might as well delve into a peculiar phenomenon that PCPer.com had run into in their testing. Which is that once the DX12 benchmark was run on the lower tier GPUs after they were overclocked the result was actually far closer to their higher end siblings than would’ve made sense. Futuremark did not provide an explanation nor did PCPer attempt to do so.

However what would explain this phenomenon is that the amount of drawcalls that the GPU can handle aren’t necessarily tied to the amount of shaders available on the GPU. Because remember the DX12 3DMark test in question only involves polygons and textures, no shaders. What would determine how many drawcalls the chip can handle comes down to software and hardware.

Note how the 960 and 980 show identical numbers in DX11, while the 290X and 285 also show identical numbers in DX11. So this explains the software part. Once we remove the API software bottleneck we expose another bottleneck or in this case probably multiple other bottlenecks. In this case It’s most likely due to the amount of available bandwidth to the GPU in addition to the internal GPU hardware and semantics by which CPU drawcalls are handled.

So what could explain the relatively small delta between the R9 285 and the 290X is that the 285 has eight asynchronous compute engines, exactly the same as the 290X. These are engines built directly into the GPU core itself which are responsible for handing out tasks to the various compute units / shaders inside the chip. However the additional memory bandwidth which the R9 290X has available to it compared to the 285 could be the reason why it can still edge out the 285 . In the case of the GTX 960, the result doesn’t make much sense either. Because the GPU was overclocked by approximately 30% yet that resulted in the GPU handling 57% more drawcalls.

All in all as mentioned previously, synthetic benchmarks are very rarely the ideal metric to reflect real world performance. So the numbers you see above should only be taken as a rough estimate of how much superior lower level APIs such as DX12 and Mantle are compared to established traditional approaches which is what DX11 represents today.