The original technique is not very suitable to GPU with pixel shaders alone, so some adaptation was needed.
The reason is that the algorithm scans edges and patches pixel based on the edge length, and the configuration at edge extremities (to sum up).
Edges extremities can be far from the current pixel, so using a pixel shader (pure parallel model) requires each pixel to recompute the distance from itself
to the edge extremities. For an edge of length N, the complexity becomes O(Nē), which can lead to performance problems.
The obvious solution is to compute a bilateral distance texture.
The algorithm in this work unfolds as follow :

- Detect edges of the image based on color difference (could be also Z and Normal deltas if 3D datas are available, this can make
a huge difference in quality), store in a R a boolean to indicate horizontal edges, in G another boolean to indicate vertical edges.
A rgb565 texture is well suited for this. During this operation, set stencil to 1 where edge was found (using a pixel discard)
- scan edges in cardinal directions, until edges end, or an orthogonal edge is found. Do this up to 4 pixel (only for pixel with
stencil = 1, for speedup). store the distance in a RGBA8 texture (each component for a cardinal direction)
- for each direction, propagate distance. if D(x) is the current distance for pixel x (which can be 4 at most for now).
Update the distance 4 time, by doing D'(x)
- repeat previous step : the max distance is now 64, unless initial distance is less than 16 (no-op in this case)
- repeat previous step : the max distance is now 255 (max distance that can be stored in a byte), unless initial distance is less than 64 (no-op in this case)
- perform final blend (see mlaa.ps in the .zip for details)

As hinted by www.iryokufx.com/mlaa , bilinear filtering can be use to speed up things. I used it during edge scaning (to test 2 edges in a single
texture fetch, and in final blend)
I borrowed the idea to encode distance as a RGBA8 texture from there (I was initially going to use a float16 texture) : http://igm.univ-mlv.fr/~biri/mlaa-gpu/MLAAGPU.pdf, though the idea
to use a bilateral texture do not come from this paper. I also tried to use a look up table to encode blend weight as they did, but in my case
it was slower (I guess this is because the ALU/TEX ratio on my GPU must be higher).

On a nVidia 8700MGT, for a 800x600 image the time to process is 6,3 ms. I guess a desktop GPU would do much better.

Lately I tried to reproduce the lightmap technique that can be seen in the UDK.
This technique encodes the distance of occluders boundaries in a Luminance8 texture, instead of storing the shadow term directly. A simple MAD operation is then used in the pixel shader to retrieve the shadow value from the interpolated distance (so it is essentially free compared to standard shadow masks). In my attempt I used ray tracing to computed
the distance of occluder as seen from the receiver. As can be seen on the following screenshots, the accuracy is much
better than with usual shadow mask. This is very similar to what Valve did in Half-Life 2 with their vector textures :
Improved Alpha-Tested Magnification for Vector Textures and Special Effects

Not a CG entry this time, but a small game. This is my first application for the IPhone (now available for android). The game was inspired by both sokoban and pengy, an old Atari ST game ...
The levels alternate between action and puzzles to solve. Using the controller on screen worked well in the end, because there are
no other controls to focus on ... but still not the best mean of control for the IPhone :/
Most of the code is C++, with some parts in Objective-C.

With everyone doing SSAO these days I decided to give it a try too. I developped an extension of the algorithm, that shows how to add high frequency details to the ambient occlusion term.
The technique uses 3 color components to store the occlusion over each third of the hemisphere for each sampling position (whereas standard SSAO samples over a 'whole' hemisphere for each pixel in screen space). Each component is then blurred using a 'bilateral filter' as usual.
In this work I did a separate edge filter pass before the blur. In the end, the normal map is read, and occlusion is computed by using a weighting of the normal with respect with each sampling direction.
This is very similar to the source shading, but applied to screen space. The executable also has additionnal features such as diffuse bleeding.
The technique can somewhat enhance environments where baking of the lighting is not possible (dynamic or/and huge worlds ...). The aim of the sample is to demonstates the visual enhancement that normal maps brings, but speed-wise it can certainly be improved :).

This sample requires a recent DirectX 9.0c runtime and a Shader 3.0 capable GPU. It was tested on a nVidia 8700M GT GPU and 8800.

A small demo that demonstrates "volumetric lighting". A shadow map
as well as a "gobo" texture are sampled along each view ray in the
pixel shader to produce the final color for each pixel. Two techniques are demonstrated.
In the first one, dynamic branching is used to know when to stop the sampling.
The second technique doesn't rely on dynamic branching, but use occlusion queries
and multipass with stencil test to now when to stop the rendering. When hardware
shadow maps are activated, percentage-closer filtering is used, which lead to
a better quality.

You need a (fast...) GPU supporting Pixel Shaders model 3.0 to run the demo
(I tested it on a nVidia 6800 and a 8700M GT, don't know if it works with ATI
GPUs ...) The pixel shader is very costly so, it may be necessary to reduce the
window size to get something smooth on older GPUs ...

This sample demonstrates the "normal mapped radiosity" technique
(also known as "directionnal lightmaps") presented by Valve Software
for the Source engine. It is based on the following paper : "Half
Life 2 / Source shading". I found it interesting, so I decided to try to
code it. Here's my try at implementing what is presented in the article.

The sample introduce a slight variation : the lightmaps contain only "indirect
lighting", instead of full lighting. This allow to retain shadows and specular
for each light, instead of using probes and cubemaps. To achieve this I used
the baking system I developed earlier for lightmasks (per-light static shadows
mask), and extended it to include this new, better ambient lighting. The result
is of course, slower, because it is using multipass lighting rendering instead
of a single pass shader (with modulated shadows), but because of the new ambient
term, "fill-lights" are less necessary, so it is possible to have
each surface hit by one light at most, and still have a good overall lighting,
and thus single-pass rendering everywhere if necessary (there are some overlapping
lights in the demo, so it is not a single pass render). The program allows to
see how using "normal mapped indirect lighting" as the ambient component
in a scene can enhance the realism of the rendering. To demonstrate this, sliders
allow to control the amount of ambient lighting on static geometry and dynamic
geometry. For dynamic geometry I implemented radiance sampling on a regular
grid, as described in the paper. Dynamic objects lookup the "light flow"
at their center in this grid.

You need a GPU supporting Pixel Shader 2.0 to run the demo (ATI Radeon 9500
or more, nVidia Geforce FX or more)

This sample only implements the case of a spherical light source. while the
original papers used 2 visibility buffers (one additive and one subtractive),
this sample improves the original algorithm by using a single v-buffer, and
using subtractive blending to account for negative values.
As a result some v-ram is saved, and some bandwidth is saved for the final compositing
shader as well. Moreover, the texture containing object coordinates are computed
in camera space, not in world. This results in increased accuracy for FP16 buffers,
as coordinates are more likely to be in the same small, delimited range.

You need a GPU supporting Pixel Shader 2.0 to run this program (ATI Radeon
9500 or more, nVidia Geforce FX or more)

A reasonably fast cpu (1ghz should be ok, development machine was a P4 1.6
ghz)

512 mb of RAM on windows xp (may work with 256 mb, not tested)

Version history :

Version 1.1 (12/5/2007)

- Added a new GUI at start
- Added multisampling option
- Added 'no sound' option
- The initial engine was using lua for material animation scripting, but
this was causing annoying freezes from times to times (due to the lua garbage
collector). In this version all animated part of materials have been rewritten
in C++, so the framerate is more steady.

Version 1.0 (11/5/2007)

- Initial party version

Downloads :
You can download the demo here
(version 1.1)
If you'd like to interactively fly through some scenes of the demo, some of
them are included here!
A 640x480 video in the avi format is available here