The main difference is just that my implementation is a little bit simpler, and a bit faster, with a few of the parts removed - though I don't think quality has been affected much. It also does not require normals data in a texture, but rather attempts to reconstruct it from the depth buffer when needed.

I also needed to tweak quite significantly a lot of the parameters. I ended up with some great results that look really quite similar to the Crytek implementation. Here are a couple of thoughts and things I learnt:

Randomness is really important. The noise texture is key and without it the algorthm result is barely recognizable as SSAO. When I disabled the randomness what I expected was a kind of 16-level banding (I was using 16 samples) but instead what I got was far from that. Only the darkest areas had a 16 layer banding. The rest, which was often only occluded by a couple of objects, got a kind of two or three tier banding. This banding was also far from regular - and more a consiquence of what particular sample directions were chosen. The main problem was that instead of a regular banding like you might get with PCF shadows, I actually got bands which were silhouettes of recognizable objects as well as all kinds of other very recognizable artifacts.

So the randomness is important. It isn't just a way of removing banding - I guess you can consider it sort of similar to the random variable in Monte Carlo integration.

When I finally realized how important the random factor was into the equation it still took a while to tweak the parameters until it was at a decent level. First results were very noisy and I wasn't really sure how to fix it. In fact it took me a long time to really work out how the different parameters effected the result. They can have some odd ranges and peculiar magnitudes for realistic values. Also don't assume you can just blindly copy the parameters from someone else's implementation. They are heavily dependant on factors such as the screen resolution and the near and far clipping planes.

Learning what they all do is key to getting the effect you want. There are many different kinds of effects and looks which are possible and most of them are a kind of trade off. In my final implementation I looked for a result which would really highlight and accentuate smaller details - the occlusion range looks like it is near to half a meter. The tradeoff for this look is that you can tend to get haloing and a kind of rim lighting around lots of objects. If you don't mind about the smaller details its perfectly possible to get an occlusion that looks much more like global illumination; though this often will halo in the opposite direction - putting shadows around objects which don't need it.

Unfortunately I was on a platform that didn't allow me to define uniforms which I could tweak with some sliders or something in-game - so I had to recompile the shaders every time with new constants. If you are given this opportunity to use some in-game value take it, because it will save you a whole lot of time.

Once you've tweaked the parameters to a good extent you'll probably be left with a somewhat noisy effect that generally looks like SSAO. With this you have a couple of options. What I would recommend is rendering it to texture, generating mipmaps and then using it down-sampled for whichever shaders you wish to apply the SSAO factor too. Most places also say that it should be applied to the ambient term but it can be interesting to play with it in other places too.

Before I actually got into the nitty gritty details I always had trouble trying to imagine how SSAO algorithms worked. The trouble was in all the articles I had read they talked about a "sampling sphere" and I began to assume that it required a fully deferred pipeline with position data in a buffer as well as all kinds of other stuff. In reality it is far more simple. It does have a sampling sphere but this is a purely screen-space construct and you can imagine it in a flattened context most of the time. The reason for this - rather than say a sampling circle - is to ensure the sampling density is correctly distributed in the space.

The basic idea is this. For each pixel imagine this sampling sphere. In this sphere we generate 16 random vectors. We then work out the screen-space normal of the initial pixel. It is good to also imagine this normal being represented in the same sampling sphere. All the vectors which are pointing in the opposite direction to the normal we flip so that now they are in the same hemisphere as the normal vector.

These vectors act as our samples. We simply project each of them back onto the depth texture and look at the depth they point at. If the depth is closer to the viewer than the initial pixel's depth then we record the initial pixel as being occluded by some amount.

To work out the amount of occlusion you can use various methods. Ideally you want some difference to be the perfect occlusion, with a larger or smaller depth difference meaning less occlusion. We can represent this quite nicely using the smoothstep function and it also gives us a good boundary so we know when pixels are absolutely not occluded by another.

With this image in my head it is clear to see how it works. A surface effectively looks in front of itself for any pixels which might overshadow it. Imagine a pixel on a ground surface almost perpendicular to the screen. It will generate a hemisphere which points upwards in screen space and sample pixels almost directly above it. This is almost precisely what we need for occlusion - and why the simple algorithm is so effective.

Anyway, please use/borrow/steal the above code for whatever needs you have. If you have any questions feel free to drop me an e-mail. Here are some pictures of the results: