BasicEffect optimizations in XNA Game Studio 4.0

The BasicEffect API and feature set did not change in Game Studio 4.0, but the implementation saw some aggressive optimizations.

In previous versions, BasicEffect was intended as a starting point for beginners. We expected expert programmers to soon move on to writing their own shaders, so as long as BasicEffect performed adequately on both Windows and Xbox, it wasn't worth spending time on further optimization.

Windows Phone changed this priority for two reasons:

Mobile GPUs are slower than Windows or Xbox, so every shader instruction makes a significant difference.

Because we do not support custom shaders on Windows Phone, BasicEffect needs to be as fast as humanly possible.

I though it would be interesting to describe the details of how we sped things up.

Shader permutations

In spite of its name, BasicEffect is actually quite complex! If you look at the HLSL source code, you will see that the previous versions included 12 different vertex shaders to support all permutations of these options:

Lighting: none, per vertex, per pixel

Vertex color: off, on

Texture: off, on

Plus 4 different pixel shaders:

Per pixel lighting: off, on

Texture: off, on

BasicEffect has many other adjustable knobs, but we did not include specialized shaders for every possible combination. For instance we always evaluated the fog equation, and implemented the FogEnable property by setting parameter values to make the result come out zero if we did not want fog.

There is a balance between providing many shader permutations (which minimizes GPU instruction counts), versus fewer shaders (which minimizes memory overhead and development/test cost). When we reevaluated this balance in the light of Windows Phone, we decided to add more specialized shaders. As of Game Studio 4.0, BasicEffect now has a total of 32 permutations. There are 20 vertex shaders:

Fog: off, on (fog=off only for the versions that do not include lighting)

Plus 10 pixel shaders:

Lighting: none, per vertex, per pixel

Texture: off, on

Fog: off, on (fog=off only for the versions that do not include per pixel lighting)

Implications:

Changing FogEnable used to have no performance impact, but turning off fog is now a little faster.

Using just one vertex light (PreferPerPixelLighting=false, DirectionalLight0.Enabled=true, DirectionalLight1.Enabled=false, DirectionalLight2.Enabled=false) is faster than if you use all three lights.

Preshaders

A common tension in shader programming is that when you design effect parameters to provide a nice clean API, the resulting parameter formats are not always the most efficient for HLSL optimization.

D3D tries to correct any such mismatches through a feature called "preshaders". The HLSL compiler looks for computations that are the same for all vertices or all pixels, and moves these out of the main shader into a special setup pass which runs on the CPU before drawing begins. This is a great feature, but has a couple of fatal flaws:

The HLSL compiler does not always spot every optimization possibility

The virtual machine that evaluates preshaders is not especially efficient

Preshaders are not supported on Xbox or Windows Phone

Game Studio 4.0 adds the ability to implement preshader computations in C#, by overloading this new method, which is called immediately before EffectPass.Apply sets parameter values onto the graphics device:

protectedvirtualvoid Effect.OnApply();

This allows BasicEffect to expose whatever properties the API requires, without needing these to match the underlying HLSL shader parameters. When the programmer changes a managed property, we just set a dirty flag, then recompute derived HLSL parameter values during OnApply. We used this new ability to precompute many things:

Collapse the World, View, and Projection matrices into a single WorldViewProj matrix.

When lighting is enabled, compute the WorldInverseTranspose matrix. This is neccessary for correct normal transforms when using non-uniform scales, but something we never bothered to do right in previous versions.

Extract the EyePosition vector from the View matrix.

Combine FogStart, FogEnd, World, and View, generating a vector that can compute fog amount with a single dot product.

Merge the DiffuseColor, EmissiveColor, AmbientLightColor, and Alpha properties into a more efficient set of combined parameters.

Do less work

We also applied some good 'ole algebraic optimizations, using math to find cheaper ways of getting the results we wanted.

We got a nice win from vectorizing the lighting computations, using matrix operations to evaluate all three lights at the same time. The new code is harder to read, but a couple of instructions shorter.

One place we slightly changed the final output is the fog equation. Previous versions used distance fog, which is computed from the distance between camera and vertex. We now use depth fog, which only considers how far in front of the camera each vertex is, ignoring any sideways offset. The visual difference is subtle, but depth fog is much cheaper to evaluate.

Results

To take one example, here are the instruction counts for BasicEffect using vertex color and texture, but no lighting or fog:

Vertex Shader

Pixel Shader

Game Studio 3.1

30

6

Game Studio 4.0

6

3

Preemptive question: "can we get the source for these optimized shaders?"

I would certainly love to release this, but we haven't worked out he details yet. Stay tuned!

>>>Damn, 30 to 6. You guys are raising the bar to production level.<<<

Just how slow is the phone GPU compared to the xbox gpu(a good benchmark for performance in general)? eg 5x, 10x, 100x?

On windows/360 a shader of mostly ALU instructions doesnt even register for a GPU... In a realistic situation(ie not 1 million cat particles:-) the only thing I have found to even tax the GPU shader wise are post processing effects.

However the effects changes look very interesting, most of my draw loop is spent calling into the effect object, begining the effect, commiting changes, setting the technique etc.

How does the change in effect code format improve things(I noticed this mentioned before)? Faster validation in those calls perhaps?

Clayman

25 Apr 2010 2:32 PM

>Preemptive question: "can we get the source for these optimized shaders?"

As usual, thanks for the update. It sounds like the optimizations were well thought out, and should make for a nice improvement in performance.

The one area I am concerned about is the switch from distance fog to depth-based fog. It seems to me that if fog is now in its own specialized set of shaders anyways, you might as well leave it as distance based. Otherwise, in exchange for a bit of performance, people will experience objects popping in and out near the FogStart depth as they rotate their cameras.

>>We expected expert programmers to soon move on to writing their own shaders

>Out of curiosity, how close was this expectation to reality? Do you know?

I don't have the numbers, but I'd guess that this expectation didn't really hold. I'm pretty comfortable with writing shaders, but when BasicEffect just fits the bill why bother?

Michael Wilson

26 Apr 2010 4:24 AM

What prevents the inclusion of custom shaders on Windows phone? Are you using mobile GPU compilers that aren't authorised for public release?

Unrelated question; do the new XNA4 render states mean that setting render state from C# is effectively as fast as doing it in the HLSL technique definition? Or is the later still faster due to fewer driver calls?

Christian Schinkoethe

26 Apr 2010 10:00 AM

Very nice, but it would still be nice to have custom effects or something similar for the phone to do post processing.

Artur Pais

26 Apr 2010 10:24 AM

Thank you for this information. Very nice work, I'm having a great time working with the Windows Phone 7 developer tools.

Since I'm still new to XNA, I have a question: is it better to have a single BasicEffect instance and change it's properties dynamically or would we have better performance with multiple instances, each with unique set of properties?

Just as an example, I could have a BasicEffect with lighting on, texture off, fog off and another with lighting off, texture on and fog on. Before rendering an object, I would only need to set the World, View and Proj Matrix on each of these effects. Is this a better approach?

Mobile GPUs are an order of magnitude slower than game consoles. I'm hesitant to quote exact numbers because performance is too multi-dimensional a problem to be meaningfully boiled down to just a single ratio (the only way to get really accurate data is to measure your specific code once hardware is available) but you definitely shouldn't be expecting anywhere near Xbox console level perf. Also, the CPU/GPU balance is significantly different from Xbox (the ARM CPU in Windows Phone is really rather fast).

> > We expected expert programmers to soon move on to writing their own shaders

> Out of curiosity, how close was this expectation to reality? Do you know?

It depends on which specific expert programmer you look at: many write all their own shaders, while others get great mileage and never see the need to move beyond the built-in stuff. I've been surprised how many awesome games made by awesomely skilled teams (especially on the XBLA side) achieve amazing graphical quality just from having great game design and great artists, and do all their drawing with SpriteBatch!

However, even when we saw people making polished games with BasicEffect, I never saw a game that was bottlenecked by BasicEffect GPU performance on Windows or Xbox, so this was never a sensible place to spend optimization effort (until Windows Phone came along, anyway).

> The one area I am concerned about is the switch from distance fog to depth-based fog. It seems to me that if fog is now in its own specialized set of shaders anyways, you might as well leave it as distance based. Otherwise, in exchange for a bit of performance, people will experience objects popping in and out near the FogStart depth as they rotate their cameras.

There isn't a really clear right answer here. From our tests, the two fog modes are visually indistinguishable for the vast majority of games, although of course there are a handful where this can make a difference. The performance difference is pretty significant, though, which matters a lot for Windows Phone. In the end we came down on the side of the more efficient version, reasoning that Windows or Xbox developers who needed different visual results can always write their own shader to implement that, while Windows Phone developers who needed more perf would have no other options, so the Windows Phone requirement should take priority here (it didn't seem like an important enough issue to warrant supporting both options).

> do the new XNA4 render states mean that setting render state from C# is effectively as fast as doing it in the HLSL technique definition?

C# vs. FX state setting performs roughly the same in all Game Studio versions. Even prior to state objects, the cost of changing states is almost entirely in the driver, so it makes little difference whether you trigger this work from jitted C# code vs. interpreted FX data tables.