When will DirectX 9.1 arrive?

To make it clear right from the start, DirectX 9.1 will most likely never be released. But there is the persistent rumour that DirectX 9.1 will boost performance for Nvidia cards up to 60%. We will now analyse this rumour.

First of all: Would this be possible? Clear answer: No. Though the pixelshader of the FX series has quite a lot more functionality than the hardware of the current Radeons, it is only capable of executing less operations per clock cycle. If the shader program was coded in a radeon-optimized format, the CineFX architecture will additionally be slowed down. The way instructions are coded in favour of the Radeons causes execution interruptions on Nvidia hardware. The Radeons are not so vulnerable, admittedly it's because of their simpler shader implementation.

In order to make the CineFX functionality powerful at all, Nvidia had to make trade-offs for the design. Complying with the complex optimization rules for FX shaders, you may in some cases earn more performance than on Radeon hardware. Apart from special cases, the Radeon stays on top regarding overall performance. Nvidia can only compensate with higher clockspeeds.

A shader program to be executed by graphic cards exists the way it is. DirectX 9.1 can't do anything about that. Shader programs are nowadays likely to be written in a "high level shading language" (HLSL). With help of the DirectX Development Kit a compiler translates the program into the "pixelshader language". Meanwhile, there is a shader profile called 2_A available (which is optimized for the 2_X shaders and thus for GeForceFX) and of course there is the 2_0 shader profile.

That means that the additional work for developers supporting two architectures is kept within a limit. The same source code has to be compiled twice and the game has to detect which graphics card is in the computer in order to select the best shader code. The current DirectX Development Environment provides functions to determine which profile is the best for the running hardware.

Developers usually don't deliver pure HLSL code to the DirectX runtime. If they did so, an updated DirectX would be able to consider and to accelerate new hardware. Unfortunately, that is not the case. Developers don't like unveiling their work to others. So DirectX has to deal with the finished compilation binaries.

Nvidia is aware of the difficulty that virtually all older shaders are optimized for Radeon hardware. That was not an evil intension of the developers. ATI was just the first delivering a DirectX9 compliant hardware accelerator (Radeon 9700 /Pro) and also delivering optimization recommendations for their chips (long before GeForceFX 5800 Ultra).

Let's bear two things in mind: Performance is better on Radeons. The GeForce chips have less raw-power and are also vulnerable to efficiency drops when CineFX recommendations are violated. The point is: Radeons are nearly maxed out because of the very conceptional optimization recommendations (and the fact that some things which were common practice with pixelshader 1.4 can also be found at 2.0). If the GeForceFX shaders now became 60% faster they would get ahead of the Radeons. But that cannot be because the base performance of the GeForce is lower.

To explain where the 60 % came from: Nvidia delivers a document for the "Unified Shader Compiler" (lokal copy):

"The NVIDIA unified compiler technology efficiently translates these operands into the order that maximizes execution on NVIDIA GPUs—texture, texture, math, math. This one compiler feature can deliver a 60 percent performance improvement for DirectX 9 applications, and points out how a minor programming difference can result in significant performance impact on programmable GPUs."

This is about a new driver feature in the Detonator 50 (now called ForceWare) and above. Radeons' catalyst driver also optimizes shader programs. For GeForceFX cards it is quite more important. And indeed the new driver delivers about 10 to 20 percent more performance in DirectX9 games without influencing image quality. In clean syntetic shader benchmarks you can achive 100 percent more performance (depends on optimization of the "raw material"). 100 % more performance compared to the old driver, not compared to the Radeons. Also keep in mind, that this driver doesn't need DirectX 9.1. It's fully functional with DirectX 9.0.

Pixelshader 2.0 and 3.0 are defined within DirectX 9.0. Shader 4.0 will be provided in "DirectX Next" (DirectX10). There is simply no need for an "in-between version". Extended features will be supported by caps and not by new shader versions. Of course the development tools of DirectX are frequently updated. So for developers there could be another built of DirectX, but for the end consumer it changes nothing even if there will be a DirectX 9.1.

We fear, that a lot of manufacturers will come up with the idea to promote the shader 3.0 support as DirectX 9.1 compliant, as already seen in the near past on quite a lot of graphic chip roadmaps. Somebody associated DirectX 9.1 with 60% performance gain and spread this mazy message. Unfortunately, lots of online and print magazins copied and still copy this message unexamined. There is already some solid information about DirectX Next available, but Microsoft never said a word about releasing a DirectX 9.1 runtime.

GeForceFX user have to bury the hope for 60% more performance, because the performance has to be available in hardware first. There is a compiler profile for 2_X in the DirectX9 development tools and additionally the recompiler in the Forceware 50 driver does whatever it can. In our opinion GeForce and Radeons can be regarded as virtually maxed out concerning shader performance.

Unless Nvidia improves the recompiler considerably and dares using FX12 hardware for DirectX pixelshader. FX12 calculating power is available in raw masses. Admittedly, it is not easy to use FX12 when FP24 precision is required. It is also not permitted by DirectX.

Albeit, you can assume that big performance gains of the current Radeon and GeForce architectures are not possible neither by drivers nor by new DirectX releases. That's why both manufacturers support more and more individual game projects and game developers because more performance on the current hardware can only be achieved by optimizing individual games (quite time and work intensive).

Thanks to Demirug for providing us with background information about this subject.