This demo shows a way to achieve higher quality texture compression than DXT1 at a bit higher bitrate by using 3Dc+. Don't complain about the lack of artwork or eyecandy in this demo (it's just a single textured quad) since that's not the point, but the point is to illustrate the quality difference between this method and DXT1.

In JPEG compression the first step is to convert RGB to YCbCr, a color space based on luminance (Y) and chrominance (Cb and Cr). The rationale for this is that the eye is much more sensitive to luminance information than chrominance, and by converting it to this color space we can sample luminance at full rate and chrominance at a lower rate. Typically JPEG files sample luminance at 1x1 and chrominance at 2x2, which already cuts down the data to half the size at nearly no visible quality degradation.

The method I use here is similar to this first compression step in JPEG, but I take it one step further by storing the Y channel in an ATI1N texture and CbCr in a lower resolution ATI2N texture. This essentially gives you a 6bpp texture format, with a lot better quality than DXT1. Now it's true that DXT1, which is 4bpp, looks good with most textures, but there are exceptions. DXT1 doesn't perform very well with photographical images, some textures with smooth gradients, some very detailed textures, non-uniformly colored textures, textures with diagonal features or features that otherwise lines up badly with the 4x4 pattern. In these cases this method looks much better. Additionally, it's in many cases possible to sample CbCr at 4x4 without significantly reducing quality, resulting in 4.5 bpp. This will almost always look better than DXT1, but could see more color bleeding than 6bpp.

Decoding YCbCr into RGB in the shader is very cheap and takes only three instructions. Generally speaking this method has quality close to RGB8 but performance close to DXT1. The default view is a bit zoomed in so you can judge quality, where performance difference is small since it's all magnified, but if you zoom out so you get most of the texture visible covering the screen you'll see bigger performance difference.

In addition to using ATI1N and ATI2N I've also added similar compression using DXT1 and DXT5. This gives the same performance as the 3Dc modes, but visibly worse quality. It's still better than DXT1 though.

Use the keys 1-6 to toggle between DXT1, YCbCr DXT/3Dc & 4.5bpp/6bpp and RGB.

This demo should run on Radeon 9500 and up and GeForce FX and up.Dynamic branching 3Tuesday, April 25, 2006 | Permalink

This demo illustrates the benefit of dynamic branching in a per pixel lighting scenario. On the F1 dialog you can toggle dynamic branching, as well as shadows and single/multi pass. Branching is used at several places in the shader. It skips past instructions if the pixel is outside the light range, is backfacing the light, or is in shadow.

This demo should run on Radeon 9500 and on and GeForce FX and up. Only those cards supporting dynamic branching will see a performance benefit by enabling this path, that is X1300 and up and possibly also the GeForce 6 series.HairMonday, April 3, 2006 | Permalink

This demo illustrates a hair simulation using R2VB. The hair consist of a large number of strands. Each strand is a line in a render target, and each node in the strand is represented by a pixel in the render target. The simulation is similar to a typical cloth simulation, except springs only connect in one direction, along the strands. The first node of every strand (the leftmost column of pixels) are locked to the moving balloon head. The rest of the hair will follow it wherever the head is bouncing. Collision is computed against the head and the floor.

Currently the demo draws the hair as lines. This has the effect that the hair will look thinner in bigger resolutions or when the head is close up. An easy fix for that would have been to scale the line width as appropriate. Unfortunately, D3D doesn't expose any way to draw wide lines. With additional work a shader could have expanded the lines into a triangle strip with adjustable width, but I was too lazy to do that.

This demo should run on Radeon 9500 and up.Infinite Terrain IIThursday, March 9, 2006 | Permalink

A bit over three years ago I made the Infinite Terrain demo, which generated new terrain on the CPU as it needed it and uploaded to a vertex buffer. With R2VB there's now a possibility to generate terrain on the GPU, and that's what this demo does. First a pixel shader generates a noise function representing the height into a texture. This texture is then sampled in a second pass to generate the normals. The normal and height is written to a RGBA16F render target which is later used as a vertex buffer. If half floats are not supported in the vertex declaration RGBA32F is used instead, and if R2VB is not supported VTF is used instead.

This demo should run on Radeon 9500 and up and GeForce 6 series and up.Selective supersamplingMonday, February 6, 2006 | Permalink

In the debate of MSAA vs SSAA most people have now accepted multisampling as the victor, at least in relation to the performance impact. Still, there are a few that love supersampling for its higher quality and its ability to reduce aliasing not just on the edges but also internally in the surfaces, whether they occur due to a badly applied texture filter or high frequency components in shader output. Some have argued that in the era of shaders and the aliasing that some of them bring supersampling will make a comeback. I'll give them half a point for that; however, the future still belongs to multisampling when it comes to driver/hardware side antialiasing. Global supersampling is just too expensive to be an option in most cases. That doesn't mean supersampling will never be applied, in fact, I believe it will be used more frequently in the future, but it will be implemented on the application side, and probably directly in the shader. This is what this demo shows.

The advantages of supersampling in the shader is that it gives the developer fine-grained control over where to apply it, to what degree and what sample positions to use, rather than just providing a global switch. In this demo there's one a bit aliasing prone bumpmap on the floor. The aliasing is showing up with the specular lighting, which is a fairly common scenario. So the app supersamples this particular material and nothing more. The walls are not supersampled, neither is the skybox and certainly not the GUI. Furthermore, the shader only supersamples the specular lighting. The diffuse lighting does not have a problem with aliasing for this bumpmap, so it's not supersampled, nor is the lightmap, base material and so on. Additionally dynamic branching can be used to shave off even more of the work.

This fine-grained selection of course means the performance loss is signficantly less than regular supersampling. A driver side 4 sample implementation would normally be in the range of 1/4 the original speed or even less, plus that it uses a good deal more memory for the framebuffer. This application side supersampling doesn't use any extra memory and is able to keep up with 77% of the original speed on my system.

How does it work? It's implemented using gradients. Remember that the gradient returns the difference between neighboring fragments in a fragment quad. The texture coordinate plus the gradient in x would land on the texture coordinate for the neighboring pixel to the right of it if it's one of the left column pixels in the quad. The math doesn't match perfectly in the other direction of course, but the disparency is normally not a concern. Multiplying the gradient with sample positions in the [-0.5, 0.5] range gives you texture coordinates for the samples, which you can then input to any function to compute whatever you want to supersample.

For the dynamic branching I compute the gradient of the center sample's specular. If there's a big change in the specular value between the pipes the supersampling will kick in, otherwise it will do with just the center sample.

This demo should run on Radeon X1300 and up and GeForce 6 series and up.

Use the 1-6 keys to select the amount of supersampling. The view wobbles a bit by default to show the aliasing a bit more clearly. To disable this, press 0. To toggle the use of dynamic branching, use the 9 key.GameEngine2Monday, December 12, 2005 | Permalink

This demo shows a fairly simple way to handle large game worlds. The main problem with large game worlds, as opposed to the regular tech demos I normally do, is that you have way too much data to throw it all on the GPU. Drawing everything is not an option.

There are loads of different techniques out there to handle this, portals, BSPs, PVS etc. In this demo I chose to go with an exemplary simple technique, namely to split the scene into axis aligned cubes. Then in a preprocessing step I compute for each cube, what other cubes it can see. Only points in open space are considered. The preprocessing step is quite expensive. It took about an hour and a half to preprocess the scene in this demo. On the other hand, it's extremely simple to evaluate on runtime and makes checking visibility for dynamic objects easy. Just check what cube(s) it's in and check whether that cube is visible from the cube of the camera. It's a conservative check which could include some hidden objects of course, but that's true for most techniques. I don't know if this technique generally performs better or worse than other techniques, but it works well enough, and it has the most important attribute, it's scalable. It's not the size of the scene that matters, but the amount of simultaneously visible lights and surfaces. The cubes also allows you to cut down on the amount of drawn surfaces for a certain light. Only the cubes that are within light radius (or light frustum for spotlights) and are visible from both the light and the camera need to be drawn.

Each cube also stores the maximum visible distance. This way the Z can be as tightly packed as possible by using only as distant far plane as needed. The advantage of this is that it improves Z precision, but perhaps more importantly, improves efficiency of HyperZ. Speaking of HyperZ, by drawing the cubes in smaller to larger distance order, we get a very good front-to-back draw order in the initial Z-pass, which gives a healthy performance increase.

There are more optimizations possible using only these cube-to-cube visibility check, and not everything is implemented. You may find a few TODOs in the source, which I didn't bother finish up for this release. Maybe I'll add it later on.

The disadvantage of this technique would be that using conventional draw calls you either have to produce an index array dynamically or split things into a lot of draw calls even for the same light/material combination. Fortunately, OpenGL has a glMultiDrawElementsEXT call that can render a range of subsets of the index buffer in a single call. This way the number of draw calls normally stick to 80-150.

This demo should run on Radeon 9500 and up, and GeForce FX 5200 and up. You need to have OpenAL installed to run this demo. You can get it here

Note: The first time you run this demo there will be about 10s delay (depending on your CPU speed) as it precomputes some data (I didn't want to bloat the download). For all runs after that it will use cached results and start much quicker.HDRMonday, September 5, 2005 | Permalink

This is an HDR (high dynamic range) rendering demo, complete with the mandatory butter-on-my-glasses blur effect. The main scene is first rendered to an RGBA16F texture. For the blur effect it's downsampled and converted to a fixed point format. This texture is then blurred in several steps, with each step sampling at twice the size as the previous step. The HDR assets, which is actually only the skybox, an RGBE format is used. The RGB components are stored as DXT1 and E as L16. This cuts down the storage space (and download size!) and memory bandwidth requirement to less than 1/3 of the cost of RGBA16F with no perceivable loss of quality. This also allows for filtering all the way back to R300.

This demo should run on Radeon 9500 and up and GeForce 6800 and up.Alpha to coverageThursday, June 23, 2005 | Permalink

One of the weaknesses of multisampling compared to supersampling is that it doesn't work too well with alpha testing, a technique that unfortunately many games still use as a replacement for real geometry. The effect is that the edges created by alpha testing aren't antialiased. The proper solution is of course to alphablend, but that means the transparent or masked objects need to be sorted in a back to front order, which can be costly and inconvenient. But there's another solution that doesn't need depth sorting and properly antialiases alpha masked surfaces, namely alpha-to-coverage. This works by sampling the alpha and interpret it as how much it covers the pixel, and then the result is dithered and distributed to an approriate number of multisample samples. So if you're using 6x multisampling and the incoming fragment's alpha is 0.5 it will be deemed to cover three samples, which will then receive the fragment data. When the multisample buffer is resolved this means it will be blended with the background which will be written to the remaining samples. It is a bit of a hack but actually works very well in practice. In fact, it often works better than supersampling, since it's using the alpha value directly rather than checking against a number of thresholded alpha values, and thus doesn't have the flicker and discontinuity problem that often occurs even with supersampling when the texture is minified a couple of mipmap levels. When magnifying the texture it results in blurrier edges though, which is also the case with alpha blending. To solve that problem this demo also implement a technique that boosts the alpha contrast around 0.5 when the texture is magnified so that the [0, 1] range of alpha values spans over the width of a pixel. To figure out how much the texture is magnified another texture is looked up with a texture coordinate that's multiplied with the size of the base texture. Each mipmap level contains the size of that mipmap level. So if the texel of the base texture is 20 pixels wide in screenspace, contrast is boosted 20x. This makes the edges equally sharp as with alpha testing, but the look properly antialiased.

To compare the results to alpha testing you can toggle between the two methods on the F1 menu.More pages:123 4 56789 ... 11 ... 12