The final part I’m going to cover for high dynamic range rendering is an implementation of lens flare. Lens flare is an effect that raises a lot of hate from some people, and it can certainly be overdone. However, when used subtly I think it can add a lot to an image.

Theory of lens flare

Lens flare is another artifact of camera lenses. Not all of the light that hits a lens passes through – some small amount is always reflected back. Most camera systems also use many lenses in series rather than just the one. The result is that light can bounce around inside the system between any combination of the lenses and end up landing elsewhere on the sensor.

If you’re not a graphics professional you could take a look at this paper which aims to simulate actual lenses to give very realistic lens flares. If there’s any possibility that you might find it at all useful then stay away because it’s patent pending (but this is not the place to discuss the broken state of software patents).

We don’t need to simulate lens flare accurately, we can get a good approximation with some simple tricks.

Ghosts

The main component of lens flare is the ‘ghost’ images – the light from the bright bits of the scene bouncing around between the lenses and landing where they shouldn’t. With spherical lenses the light will always land somewhere along the line from the original point through the centre of the image.

The lens flare effect is applied by drawing an additive pass over the whole screen. To simulate ghosts, for every pixel we need to check at various points along the line through the centre of the screen to see if any bright parts of the image will cause ghosts here. The HLSL code looks something like this:

The eight distance values control where the ghosts will appear along the line. A value of 1.0 will always sample from the centre of the screen, values greater than one will cause ghosts on the opposite side of the screen and values less than one will create ghosts on the same side as the original bright spot. Just pick a selection of values that give you the distribution you like. Real lens systems will give a certain pattern of ghosts (and lots more of them), but we’re not worrying about being accurate.

This is a simpler set of four ghosts from the sun, showing how they always lie along the line through the centre:

Four ghosts from the sun, projected from the centre of the screen

The ghost at the bottom right had a distance value of 1.62. You can see this by measuring the ratio of distance to the centre of the screen in the image above.

This next image is using eight ghosts with the code above. You can’t see the ghost for value 1.03 as this is currently off-screen (values very near 1.0 will produce very large ghosts that cover the entire screen when looking directly at a bright light, and are very useful for enhancing the ‘glare’ effect).

You can see the non-circular ghosts as well in this image, as some of the sun is occluded:

Full set of ghosts from an occluded sun

Chromatic aberration

Another property of lenses is that they don’t bend all wavelengths of light by the same amount. Chromatic aberration is the term used to describe this effect, and it leads to coloured “fringes” around bright parts of the image.

One reason that real camera systems have multiple lenses at all is to compensate for this, and refocus all the colours back onto the same point. The internal reflections that cause the ghosts will be affected by these fringes. To simulate this we can instead create a separate ghost for each of the red, green and blue channels, using a slight offset to the distance value for each channel. You’ll then end up with something like this:

Chromatic aberration on ghosts

Halos

Another type of lens flare is the ‘halo’ effect you get when pointing directly into a bright light. This code will sample a fixed distance towards the centre of the screen, which gives nice full and partial halos, including chromatic aberration again:

Put together the ghosts and halos and you get something like this (which looks a mess, but will look good later):

Eight ghosts plus halo

Blurring

The lens flares we have so far don’t look very realistic – they are far too defined and hard-edged. Luckily this is easily fixed. Instead of sampling from the original image we can instead use one of the blurred versions that were used to draw the bloom. If we use the 1/16th resolution Gaussian blurred version we instead get something which is starting to look passable:

Ghosts and halo sampling from a blurred texture

Lens dirt

It’s looking better but it still looks very CG and too “perfect”. There is one more trick we can do to make it look more natural, and that is to simulate lens dirt.

Dirt and smears on the lens will reflect stray light, for example from flares, and become visible. Instead of adding the ghosts and halos directly onto the image, we can instead modulate it with a lens dirt texture first. This is the texture I’m currently using, which was part of the original article I read about this technique and which I can unfortunately no longer find. If this is yours please let me know!

Lens dirt overlay

This texture is mostly black with some brighter features. This means that most of the flares will be removed, and just the brighter dirt patches will remain. You may recognise this effect from Battlefield 3, where it’s used all the time.

You can’t really see the halo when modulating with this lens dirt texture, so we can add a bit more halo on top. This is the final result, as used in my demos:

Final result

And that’s it for High Dynamic Range rendering, which I think is one of the most important new(-ish) techniques in game rendering in the last few years.

Last time I covered HDR render targets, tone mapping and automatic exposure control. Now it’s time to simulate some camera imperfections that give the illusion that something in an image is brighter than it actually is on the screen.

Bloom

The first effect to simulate is bloom. This is when the light from a really bright object appears to “bleed” over the rest of the image. This is an image with no bloom – the sun is just a white circle, and doesn’t look particularly bright:

No bloom

With a bloom effect the sun looks a lot brighter, even though the central pixels are actually the same white colour:

With bloom

Theory of bloom

Why does this happen? There is a good explanation on Wikipedia but this is the basic idea.

Camera lenses can never perfectly focus light from a point onto another point. My previous diagrams had straight lines through lens showing the path of the light through the lens. What actually happens is that light (being a wave) diffracts through the aperture creating diffraction patterns. This means the light from a single point lands on the sensor as a bright central spot surrounded by much fainter concentric rings, called the Airy pattern (the rings have been brightened in this picture so you can see them easier):

Airy disk

Usually this isn’t a problem – at normal light levels the central peak is the only thing bright enough to be picked up by the sensor, and it fits within one pixel. However, with very bright lights, the diffraction pattern is bright enough to be detected. For anything other than a really tiny light source the individual rings won’t be visible because they’ll all overlap and blur together, and what you get is the appearance of light leaking from bright areas to dark areas.

This effect is pretty useful for us. Because people are used to bright objects blooming, by doing the reverse and drawing the bloom we perceive the object as brighter than it really is on screen.

Implementation

The idea of rendering bloom is the same as Bokeh depth of field. Recall from the depth of field that each pixel is actually the shape of the aperture, drawn at varying sizes depending how in focus it is. So to draw Bokeh ‘properly’ each pixel should be draw as a larger texture. To draw bloom ‘properly’ you would instead draw each pixel with a texture of the Airy pattern. For dim pixels you would only see the bright centre spot, and for very bright pixels you would see the circles as well.

That’s not very practical though so we can take shortcuts which make it much quicker to draw at the expense of physical accuracy. The main optimisation is to do away with the Airy pattern completely and use a Gaussian blur instead. When you draw many Airy patterns in neighbouring pixels the rings average out and you are left with something very similar to a Gaussian blur:

Gaussian blur

The effect we are trying to simulate is bright pixels bleeding over darker neighbours, so what we’ll do is find the bright pixels in the image, blur them and then add them back onto the original image.

To find the bright pixels in the image we take the frame buffer, subtract a threshold value based on the exposure and copy the result into a new texture:

The extracted bloom – the original image with a threshold value subtracted

Then create more textures, each half the size of the previous one, scaling down the brightness slightly with each one (depending how far you want the bloom to spread). Here are two of the downsized textures from a total of eight:

The 1/8th size downsized extracted bloom

The 1/64th size downsized extracted bloom

Because we’re not simulating bloom completely accurately, there are a few magic numbers we can tweak (like the threshold and downscaling darkening) to control the overall size and brightness of the bloom effect. Ideally we would work it all out automatically from the Airy disk and camera properties, but this method looks good enough and is more controllable to give the type of image you want.

Now we have all the downsized textures we need to blur them all. I’m using an 11×11 Gaussian blur which is soft enough to give an almost completely smooth image when they’re all added up again. A larger blur would give smoother results but would take longer to draw. The reason for doing the downscaling into multiple textures is that it is much quicker to perform smaller blurs on multiple smaller textures than it is to perform a massive blur on the original sized image.

After blurring, the two textures above look like this (and similarly for all the others):

Blurred 1/8th size bloom

Blurred 1/64th size bloom

Then to get the final image we simply add up all of the blurred textures (simple bilinear filtering is enough to get rid of the blockiness), scale it by some overall brightness value and add it back on top of the tonemapped image from last time. The end result will then be something like this, with obvious bloom around the sun but also some subtle bleeding around other bright areas like around the bright floor:

The great thing about this is that you don’t need to do anything special to make the sun or other bright lights bloom – it’s all just handled automatically, even for ‘accidental’ bright pixeld like intense specular highlights.

That’s not quite everything that you can do when rendering bright things. Next time I’ll describe that scourge of late-90s games – lens flare. (It looks better these days…)

The screenshots in my graphics posts are from my raymarching renderer, and mainly consist of boxes and spheres. That’s a bit boring so I thought I’d have a look for something more interesting, and came across the Syntopia blog talking about raymarching 3D Mandelbulb fractals. I plugged the code into my renderer and it worked a treat. Here’s a video I rendered out from it:

It’s not exactly artistically shot (just a programmatically spinning camera and light) but it looks quite nice. I may set up some proper paths and things at some point and do something nicer. (Here is a much nicer animation I found, which was rendered with particles in Lightwave and apparently took nearly an hour per-frame to render. Mine took about 45 mins total.)

Running the code yourself

If you fancy having a play around (or just want to make your GPU overheat) you can download the program. I make no guarantee that the code runs on your system (I’ve only tried it on my Radeon HD7950, but it should work on most DX11 cards), and use this software at your own risk etc.

This has the .exe, a couple of textures and two shader files you can edit. The shader files are compiled on load, so you can modify the shaders (in the .fx files, you can just use Notepad or something to edit them) and re-run the .exe. Compile errors should pop up in a box.

Controls

WASD to move, hold the left mouse button to turn and hold the right mouse button to move the sun. Press H to toggle the full help text with the rest of the controls (sun size, depth of field etc).

Shaders

This demo is using a deferred renderer (for no particular reason) so there are two separate shaders you can play with. The fractal generation code is in sdf.fx and calculates and writes out the depth, colour, normals, sun shadow and ambient occlusion into buffers for every pixel. deferredshader.fx applies the lighting. Depth of field, antialiasing and lens flare are applied afterwards.

deferredshader.fx is the simpler of the two. The lighting setup is pretty much copied from this Iñigo Quilez article for the sun, sky and indirect lighting. A physically-based specular term is added (more complex but better looking than the simple specular I wrote about before) and then some fog is applied. Light colours and specular gloss can easily be edited.

sdf.fx controls the geometry generation. The SDF() function is the signed distance function, and takes a world position and returns the distance to the nearest surface. There are a few alternative functions you can try (the others are commented out). SDFMandelbulbAnimated() generates the animated Mandelbulb in the video and is the code from the Syntopia blog. SDFMandelbulbStatic() is an optimised function for generating the static power-8 Mandelbulb, using the optimised code from here. As well as that there are a couple of other functions for fun – some infinite wibbly spheres and the box and two spheres from my earlier videos.

After using the SDF to find the surface intersection point it calculates the soft shadowing from the sun (which I described here). This can be really slow so there is an option to turn it off completely at the top of the file if your frame rate is too low, or you could tweak the iterations and max radius down a bit. It also calculates an ambient occlusion term using five rays at 45° from each other. This can also be slow, and the same applies.

This is just a quick overview. Any specific queries or questions, leave a comment.

Finally, here are a few screenshots I took. Macro photography of fractals gives some quite nice results. Enjoy!

High Dynamic Range (HDR) rendering is a way of coping with the large possible range of brightness in a scene, and how to render that to a screen. There are a few parts to this which are all required to get good results. In this part I’ll cover the initial rendering, tone mapping and dynamic exposure control. Next time I’ll cover bloom and throw in some lens flare for good measure.

What is HDR rendering?

Do an image search for HDR photography and you’ll see loads of weird and unnatural looking pictures. This completely isn’t what HDR rendering is, although it’s trying to tackle the same issue.

The human eye has a dynamic range of around 1,000,000,000 : 1. That means we can see in starlight, and we can see in light a billion times brighter such as a sunny day. At any given time we can see contrast of up to 10,000 : 1 or so, but our eyes adjust so that this range is useful for whatever we’re looking at. In a bright scene we can’t perceive much detail in dark shadow, and at night a bright light will blind us.

LCD displays have contrast ratios of around 1000 : 1, so a white pixel is only a thousand or so times brighter than a black pixel (depending on the screen). This means we can’t display the billion-to-one contrast of the real world directly on a screen, but it does map pretty well to the range of the eye at any given time.

Rendering in HDR – floating point render targets

The final image that you output to the screen has 24 bits per pixel – 8 bits of accuracy for each of red, green and blue. Renderers traditionally draw to a render target with the same accuracy, so that the result can be displayed directly. This means that only 256 levels of brightness can be stored for each colour channel at any pixel. 256 levels are enough that you can’t really see any banding between the colours when looking at a smooth gradient on the screen, but it’s not enough accuracy to capture a scene with a lot of dynamic range.

Time for a contrived example. Here is a photo on a lamp on my window sill:

The exposure on this photo was 1/20 seconds, which let in enough light that you can see the detail in the trees. However, the lamp itself is pure white and you can’t see any detail. There is no way to tell if the lamp is fairly bright, or really really bright. Let’s see another photo with the exposure reduced to 1/500 seconds.

You can start to see some detail in the lamp now, but you can only just see the outline of the trees. The bulb is still white, even with this short exposure, so you can tell that it’s really bright. One more with the exposure set to 1/1300 seconds:

Now we can see more of the detail in the bulb, and the trees have almost completely disappeared. The camera won’t go any quicker and the bulb is still white, so it must be really really bright.

So you can see that with only 256 intensity levels, you have to lose information somewhere. Either you have a long exposure so you can see darker objects but lose all detail in the bright areas, or a short exposure where you can see detail in bright objects at the expense of darker areas.

To get around this we need to use a higher precision render target with a lot more than 256 intensity levels. The ideal format currently is 16-bit floating point for each colour channel (which if you’re interested can represent values from 0.00006 to 65000 with 11 bits of accuracy in the mantissa). This is more than enough precision for accurately drawing both moonlit nights and blazing days at the same time. The downsides are that 16-bit render targets require twice as much memory and they’re a bit slower to render into (more data to push around), but on modern hardware, and certainly the next-gen consoles, they’re completely viable to use.

Tone mapping – getting it on the screen

So you’ve rendered your scene into a 16-bit render target. Given that your monitor still wants 8-bit colour values, you need to do some conversion. While your HDR source image is an accurate representation of the world (including all possible light intensities), we need to attempt to simulate what your eyes would actually see so that we can draw it on a screen. This stage is called tone mapping.

Eyes and cameras can adjust to let in more or less light, or be more or less sensitive. On a camera this is controlled by the aperture size, shutter speed and ISO settings, and in the eye you have pupil size and chemical changes in the photoreceptors. This means you can see intensity variation in some small part of the entire dynamic range. Everything darker than this will just look black, and everything brighter will just look white. A tone mapping function is one that can map the entire infinite range of light into a zero-to-one range (and then you multiply by 255 to get values that can be displayed on a screen), while preserving the contrast in the part of the range you’re interested in.

One simple tone mapping operator is the Reinhard operator:

where n is a number that controls the exposure and x is your rendered value. You can see that zero will map to zero, and large values will converge on 1. A larger value chosen for n will make the final image darker (in fact n is the input value that will map to half brightness on your screen).

Let’s try it on a rendered image. I’ll use one of my demo scenes from before with no extra post-processing so you can see what’s going on. I need to pick a value for n so I’ll try 0.2:

This is over-exposed, so now lets try n = 1.0:

That’s pretty good, but the sun is still completely white so let’s see what happens with n = 10:

Like with the lamp photos, the ‘exposure’ is now short enough that the sun isn’t just a white blob (it’s still a blob, but you can now make out the slightly yellow colour).

The Reinhard operator isn’t the only option, and in fact it’s not that good. It desaturates your blacks, making them all look grey. Lots of people have tried to come up with something better, and a good one (which I’m using in my own code) is John Hable’s Filmic Tone Mapping which debuted in Uncharted 2 (and which you can read all about here if you want, including the issues with Reinhard).

Swapping to Filmic Tone Mapping we get this, where you can see a lot more contrast and saturation in the colours:

Automatic Exposure Control

We can now render our HDR image to a screen, but we’re relying on this magic exposure value. It would be really nice if we could handle this automatically, in the same way as our eyes. In fact we can do something quite similar.

When your eyes see a really bright scene they automatically adjust by closing the pupil to let in less light. In a dark environment the pupil reopens, but more slowly. To simulate this we first need to know how bright our rendered scene is. This is easy to do – we can just add up all the pixels in the image and divide by the number of pixels, which will get you the average brightness.

[To be technical, you should actually use the log-average luminance – take the log of the luminance of each pixel, average those, and then exponentiate it again. When doing a straight average a few really bright pixels can skew the result noticeably, but this has much less effect on a log-average. Also, for performance reasons you don’t actually add up all the pixels directly – instead write the log-average values into a half-resolution texture and then generate mipmaps to get down to a single pixel, which you then exponentiate to give the same result.]

Then you need to pick a target luminance, which is how bright you want your final image to be. Applying your tone mapping (with your current exposure value) to you average input luminance will give you your average output luminance. Now we can set up a feedback loop between frames – if your scene is coming out too bright, reduce the exposure for the next frame, scaling the reduction by how far off you are. Similarly if it’s too dark, increase the exposure for next frame. You can go further and increase the exposure slower than you decrease it, to simulate the eye adjusting quicker to bright scenes.

And now you never need to worry about your image being too light or dark ever again – it’s all taken care of automatically! This is another major advantage of using HDR rendering – it doesn’t matter how bright any of the lights in your scene are (in absolute terms) as long as they’re correct relative to one another.

There’s more…

The image is starting to look good, but the sun is still just a white circle. Next time I’ll talk about using bloom to ‘go brighter than white’…

The main difference is the shape of the blur – traditionally, a Gaussian blur is performed (a Gaussian blur is a bell-shaped blur curve), whereas real Bokeh requires a blur into the shape of the camera aperture:

Bokeh blur on the left, Gaussian on the right

The first question you might be asking is why are Gaussian blurs used instead of more realistic shapes? It comes down to rendering efficiency, and things called separable filters. But first you need to know what a normal filter is.

Filters

You’re probably familiar with image filters from Photoshop and similar – when you perform a blur, sharpen, edge detect or any of a number of others, you’re running a filter on the image. A filter consists of a grid of numbers. Here is a simple blur filter:

For every pixel in the image, this grid is overlaid so that the centre number is over the current pixel and the other numbers are over the neighbouring pixels. To get the filtered result for the current pixel, the colour under each of the grid element is multiplied by the number over it and then they’re all added up. So for this particular filter you can see that the result for each pixel will be 4 times the original colour, plus twice each neighbouring pixel, plus one of each diagonally neighbouring pixel, and divide by 16 so it all adds up to one again. Or more simply, blend some of the surrounding eight pixels into the centre one.

As another example, here is a very basic edge detection filter:

On flat areas of the image the +8 of the centre pixel will cancel with the eight surrounding -1 values and give a black pixel. However, along the brighter side of an edge, the values won’t cancel and you’ll get bright output pixels in your filtered image.

You can find a bunch more examples, and pictures of what they do, over here.

Separable filters

These example filters are only 3×3 pixels in size, but they need to sample from the original image nine times for each pixel. A 3×3 filter can only be affected by the eight neighbouring pixels, so will only give a very small blur radius. To get a nice big blur you need a much larger filter, maybe 15×15 for a nice Gaussian. This would require 225 texture fetches for each pixel in the image, which is very slow!

Luckily some filters have the property that they are separable. That means that you can get the same end result by applying a one-dimensional filter twice, first horizontally and then vertically. So first a 15×1 filter is used to blur horizontally, and then the filter is rotated 90 degrees and the result is blurred vertically as well. This only requires 15 texture lookups per pass (as the filter only has 15 elements), giving a total of 30 texture lookups. This will give exactly the same result as performing the full 15×15 filter in one pass, except that one required 225 texture lookups.

Original image / horizontal pass / both passes

Unfortunately only a few special filters are separable – there is no way to produce the hard-edged circular filter at the top of the page with a separable filter, for example. A size n blur would require the full n-squared texture lookups, which is far too slow for large n (and you need a large blur to create a noticeable effect).

Bokeh filters

So what we need to do is find a way to use separable filters to create a plausible Bokeh shape (e.g. circle, pentagon, hexagon etc). Another type of separable filter is the box filter. Here is a 5×1 box filter:

Apply this in both directions and you’ll see that this just turns a pixel into a 5×5 square (and we’ll actually use a lot bigger than 5×5 in the real thing). Unfortunately you don’t get square Bokeh (well you might, but it doesn’t look nice), so we’ll have to go further.

One thing to note is that you can skew your square filter and keep it separable:

Then you could perhaps do this three times in different directions and add the results together:

And here we have a hexagonal blur, which is a much nicer Bokeh shape! Unfortunately doing all these individual blurs and adding them up is still pretty slow, but we can do some tricks to combine them together. Here is how it works.

First pass

Start with the unblurred image.

Original image

Perform a blur directly upwards, and another down and left (at 120°). You use two output textures – into one write just the upwards blur:

Output 1 – blurred upwards

Into the other write both blurs added together:

Output 2 – blurred upwards plus blurred down and left

Second pass

The second pass uses the two output images from above and combines them into the final hexagonal blur. Blur the first texture (the vertical blur) down and left at 120° to make a rhombus. This is the upper left third of the hexagon:

Intermediate 1 – first texture blurred down and left

At the same time, blur the second texture (vertical plus diagonal blur) down and right at 120° to make the other two thirds of the hexagon:

Intermediate 2 – second texture blurred down and right

Finally, add both of these blurs together and divide by three (each individual blur preserves the total brightness of the image, but the final stage adds together three lots of these – one in the first input texture and two in the second input texture). This gives you your final hexagonal blur:

Final combined output

Controlling the blur size

So far in this example, every pixel has been blurred into the same sized large hexagon. However, depth of field effects require different sized blurs for each pixel. Ideally, each pixel would scatter colour onto surrounding pixels depending on how blurred it is (and this is how the draw-a-sprite-for-each-pixel techniques work). Unfortunately we can’t do that in this case – the shader is applied by drawing one large polygon over the whole screen so each pixel is only written to once, and can therefore only gather colour data from surrounding pixels in the input textures. Thus for each pixel the shader outputs, it has to know which surrounding pixels are going to blur into it. This requires a bit of extra work.

The alpha channel of the original image is unused so far. In a previous pass we can use the depth of that pixel to calculate the blur size, and write it into the alpha channel. The size of the blur (i.e. the size of the circle of confusion) for each pixel is determined by the physical properties of the camera: the focal distance, the aperture size and the distance from the camera to the object. You can work out the CoC size by using a bit of geometry which I won’t go into. The calculation looks like this if you’re interested (taken from the talk again):

[A is aperture size, focal length is a property of the lens, focal plane is the distance from the camera that is in focus. zFar and zNear are from the projection matrix, and all that stuff is required to convert post-projection Z values back into real-world units. CoCScale and CoCBias are constant across the whole frame, so the only calculation done per-pixel is a multiply and add, which is quick. Edit – thanks to Vincent for pointing out the previous error in CoCBias!]

In the images above, every pixel is blurred by the largest amount. Now we can have different blur sizes per-pixel. Because for any pixel there could be another pixel blurring over it, a full sized blur must always be performed. When sampling each pixel from the input texture, the CoCSize of that pixel is compared with how far it is from the pixel being shaded, and if it’s bigger then it’s added in. This means that in scenes with little blurring there are a lot of wasted texture lookups, but this is the only way to simulate pixel ‘scatter’ in a ‘gather’ shader.

Per-pixel blur size – near blur, in focus and far blur

Another little issue is that blur sizes can only grow by a whole pixel at a time, which introduces some ugly popping at the CoCSize changes (e.g. when the camera moves). To reduce this you can soften the edge – for example if sampling a pixel 5 pixels away, blend in the contribution as the CoCSize goes from 5 to 4 pixels.

Near and far depth of field

There are a couple of subtleties with near and far depth of field. Objects behind the focal plane don’t blur over things that are in focus, but objects in front do (do an image search for “depth of field” to see examples of this). Therefore when sampling to see if other pixels are going to blur over the one you’re currently shading, make sure it’s either in front of the focal plane (CoCSize is negative) or the currently shaded pixel and the sampled pixel are both behind the focal plane and the sampled pixel isn’t too far behind (in my implementation ‘too far’ is more than twice the CoCSize).

[Edit: tweaked the implementation of when to use the sampled pixel]

This isn’t perfect because objects at different depths don’t properly occlude each others’ blurs, but it still looks pretty good and catches the main cases.

So far I’ve covered the basics of getting objects on the screen with textures, lighting, reflections and shadows. The results look reasonable but to increase the realism you need to more accurately simulate the behaviour of light when a real image is seen. An important part of this is the characteristics of the camera itself.

One important property of cameras is depth of field. Depth of field effects have been in games for a good few years, but it’s only recently that we’re starting to do it ‘properly’ in real-time graphics.

What is depth of field?

Cameras can’t keep everything in focus at once – they have a focal depth which is the distance from the camera where objects are perfectly in focus. The further away from this distance you get, the more out of focus an object becomes. Depth of field is the size of the region in which the image looks sharp. Because every real image we see is seen through a lens (whether in a camera or in the eye), to make a believable image we need to simulate this effect.

The quick method

Until the last couple of years, most depth of field effects were done using a ‘hack it and hope’ approach – do something that looks vaguely right and is quick to render. In this case, we just need to make objects outside of a certain depth range look blurry.

So first you need a blurry version of the screen. To do this you draw everything in the scene as normal, and then create a blurred version as a separate texture. There are a few methods of blurring the screen, depending on how much processing time you want to spend. The quickest and simplest is to scale down the texture four times and then scale it back up again, where the texture filtering will fill in the extra pixels. Or, if you’re really posh, you can use a 5×5 Gaussian blur (or something similar) which gives a smoother blur (especially noticeable when the camera moves). You should be able to see that the upscaled version looks more pixelated:

Blurring using reduce and upscale, and a Gaussian blur

Then you make up four distance values: near blur minimum and maximum distances, and far blur minimum and maximum distances. The original image and the blurry version are then blended together to give the final image – further away than the ‘far minimum’ distance you blend in more and more of the blurry image, up until you’re showing the fully blurred image at the ‘far maximum’ distance (and the same for the near blur).

In the end you get something that looks a bit like this (view the full size image to see more pronounced blurring in the distance):

Depth of field in Viva Piñata

This looks fairly OK, but it’s nothing like how a real camera works. To get better results we need to go back to the theory and understand the reasons you get depth of field in real cameras.

Understanding it properly

Light bounces off objects in the world in all directions. Cameras and eyes have fairly large openings to let in lots of light, which means that they will capture a cone of light bounced off from the object (light can bounce off the object anywhere within a cone of angles and still enter the camera). Therefore, cameras need lenses to focus all this light back onto a single point.

Light cones from objects at different distances to a lens

A lens bends all incoming light the same. This means that light bouncing off objects at different distances from the lens will converge at different points on the other side of it. In the diagram the central object is in focus, because the red lines converge on the projection plane. The green light converges too early because the object is too far away. The blue light converges too late because the object is too close.

What the focus control on your camera does is move the lens backwards and forwards. You can see that moving the lens away from the projection plane would mean that the blue lines converge on the plane, so closer objects would be in focus.

There is a technical term, circle of confusion (CoC), which is the circular area over which the light from an object is focussed over on the projection plane. The red lines show a very tiny CoC, while the blue lines show a larger one. The green lines show the largest CoC of the three objects, as the light is spread out over a large area. This is what causes the blur on out of focus objects, as their light is spread over the image. This picture is a great example of this effect, where the light from each individual bulb on the Christmas tree is spread into a perfect circle:

Bokeh

The circle of confusion doesn’t always appear circular. It is circular in some cases because the aperture of the camera is circular, letting in light from a full cone. When the aperture is partly closed it becomes more pentagonal/hexagonal/octagonal, depending on how many blades make up the aperture. Light is blocked by the blades, so the CoC will actually take the shape of the aperture.

This lens has an aperture with six blades, so will give a hexagonal circle of confusion:

So why is simulating Bokeh important? It can be used for artistic effect because it gives a nice quality to the blur, and also it will give you a more believable image because it will simulate how a camera actually works. Applying a Gaussian blur to the Christmas tree picture would give an indistinct blurry mess, but the Bokeh makes the individual bright lights stand out even though they are out of focus.

Here is the difference between applying a Bokeh blur to a bright pixel, compared to a Gaussian blur. As you can see, the Gaussian smears out a pixel without giving those distinct edges:

Bokeh blur on the left, Gaussian on the right

Using Bokeh in real-time graphics

In principle, Bokeh depth of field isn’t complicated to implement in a game engine. For any pixel you can work out the size of the CoC from the depth, focal length and aperture size. If the CoC is smaller than one pixel then it’s completely in focus, otherwise the light from that pixel will be spread over a number of pixels, depending on the CoC size. The use of real camera controls such as aperture size and focal length means that your game camera now functions much more like a real camera with the same settings, and setting up cameras will be easier for anyone who is familiar with real cameras.

In practice, Bokeh depth of field isn’t trivial to implement in real-time. Gaussian blurs are relatively fast (and downsize/upscaling is even faster) which is why these types of blurs were used for years. There aren’t any similarly quick methods of blurring an image with an arbitrary shaped blur (i.e. to get a blur like the left image above, rather than the right).

However, GPUs are getting powerful enough to use a brute force approach, which is the approach that was introduced in Unreal Engine 3. You draw a texture of your Bohek shape (anything you like), and then for each pixel in your image you work out the CoC size (from the depth and camera settings). Then to make your final image, instead of drawing a single pixel for each pixel in the original image, you draw a sprite using the Bokeh texture. Draw the sprite the same colour as the original pixel, and the same size as the CoC. This will accurately simulate the light from a pixel being spread over a wide area. Here it is in action, courtesy of Unreal Engine:

Depth of field with different Bokeh shapes

The downside of this technique is that it’s very slow. If your maximum Bokeh sprite size is, say, 8 pixels wide, then in the worst case each pixel in the final image will be made up of 64 composited textures. Doubling the width of the blur increases the fill cost by four times. This approach looks really nice, but you need to use some tricks to get it performing well on anything but the most powerful hardware (for example, draw one sprite for every 2×2 block of pixels to reduce the fill cost).

An alternative method

There is a good alternative method that I like which is much quicker to draw, and I shall go through that soon in a more technical post. It was presented in this talk from EA at Siggraph 2011, but it takes a bit of thought to decipher the slides into a full implementation so I’ll try to make it clearer. This is actually the technique I use in my Purple Space demo.

The last basic rendering technique to talk about is shadowing. Shadows are really important for depth perception, and are vital for ‘anchoring’ your objects to your world (so they don’t look like they’re just floating in space).

Shadowing is a very popular topic for research, so there are loads of variations on how to do it. Years ago you could just draw a round dark patch at a character’s feet and call it done, but that doesn’t really cut it these days.

Blob shadows

Shadows are usually implemented using a technique called shadow mapping. To go right back to basics, you get shadows when the light from a light source (e.g. the sun) is blocked by something else. So if you were stood at the sun (and assuming no other light sources) you wouldn’t see any shadows, because you would only see the closest point to the sun in any direction. This fact is the basis of shadow mapping.

Shadow map texture

What we’re going to do is draw the scene from the point of view of the sun, into a texture. We don’t care about the colour of the pixels, but we do care about how far away from the sun they are. We already do this when drawing a depth map, as I spoke about here. Because we need to use the shadow map when rendering the final image, we need to render the shadow map first (at the beginning of the frame).

There are considerations when rendering your shadow map that I’ll get to later, but first I’ll talk about how we use it. Here is the final scene with shadows that we’re aiming for:

Final scene with shadows

And this is the shadow map, i.e. the scene as seen from the light (depth only, darker is closer to the camera):

Shadow map rendered from the light source

When drawing the final image we use ambient/diffuse/specular shading as normal. On top of that we need to use the shadow map to work out if each pixel is in shadow, and if so we remove the diffuse and specular part of the lighting (as this is the light coming directly from the sun). To work this out we need to go back into the world of transforms and matrices.

Rendering with the shadow map

When I spoke about view matrices I introduced this, which is how the basis, view and projection matrices are used to get the screen position of each vertex in a model:

If you remember, the basis matrix will position an object in the world, and the view and projection matrices control how the camera ‘sees’ the world. In the case of shadowing, we have two cameras (the usual camera we’re rendering with, and the one positioned at the light source). The other thing we have is the shadow map, which is the scene as viewed with the second camera at the light.

To perform shadowing, we need to find exactly where each pixel we’re drawing would have been drawn in the shadow map. So, when transforming our vertices, we need to do two separate transforms:

The first tells us where on the screen the vertex will be drawn (X and Y positions, and depth), and the second tells us where in the shadow map it would be drawn (again X and Y positions, and depth). Then in the pixel shader, we can perform a texture lookup into the shadow map to get the depth of the closest pixel to the light. If our calculated depth for that pixel is further away then the pixel is in shadow!

One problem you will have is shadow acne. When you’re rendering a surface that’s not in shadow, you’re effectively testing the depth of that pixel against itself (as it would have been rendere into both the shadow map and the final image). Due to unavoidable accuracy issues, sometimes the pixel will be very slightly closer and sometimes it’ll be slightly further away, which leads to this kind of ugly effect:

Shadow acne

Because a surface should never shadow itself we use a depth bias, where a small offset is added to the shadow map depths to push them back a bit. Therefore a surface will always be slightly in front of its shadow, which cures this.

Rendering the shadow map

You need to take some care when deciding how to draw your shadow map – exactly as when you’re drawing your scene normally, you could point the camera towards any point in the world and your frustum could be any size. Also, you could use any sized texture to render into. All of these things affect the shadowing.

Let’s start with the easy one, the resolution of the texture. A small texture won’t have enough resolution to capture the small details in the shadow, but a large one will take a long time to draw, affecting your framerate. You might go for a 1024×1024 pixel shadow map, or double that if you want high quality.

The effective resolution of the shadow map is also affected by how wide a field of view you use when rendering it – if it’s very zoomed in then you’ll get a lot of pixels per area in the scene, but you won’t have any information at all outside of that area (so you won’t be able to draw shadows there). Therefore you need to pick a happy medium between high detail and a large shadowed area.

Cascades Shadow Maps

There is a way to get around the problem of having either detailed shadows or a large shadowed area, and that is by using a technique called Cascaded Shadow Maps. This just means that you use multiple shadow maps of different sizes. Close to the camera you’ll want detailed shadows, but further away it doesn’t matter so much. Therefore you can draw a second, much bigger shadow map (that therefore covers a much larger area) and when rendering you check whether the pixel is within the high detailed map. If not, sample from the lower detailed map instead.

This scene is showing how a cascade of two shadow maps can be used. You can see the blockiness caused by the lower detail map on the blue part of the box, but it enables the shadows to be rendered right into the distance:

Cascaded Shadow Maps – red is in the high detail map, blue is in the low detail map, green is outside both maps and not shadowed

You don’t need to stop at two shadow maps – the more you have, the more detailed the shadows can be in the distance, but the more time you have to spend drawing the maps. Three maps is a common choice for games with a long draw distance.

Filtering

One of the biggest problems with shadowing is filtering, or how to get soft shadows (which I talked a bit about here). The shadows we’re drawing here are hard shadows, in that each pixel is either fully in or out of shadow, with a hard edge in between. In the real world, all shadows have some amount of ‘softness’ (or penumbra) around the edge.

With standard texturing, you can avoid hard-edged pixels by blending between all the neighbouring pixels. This lets you scale up textures and keep everything looking smooth. Or you can not, and you get Minecraft:

With and without texture filtering

This doesn’t work with shadow maps. Using a shadow map always gives a yes/no answer: is the pixel further from the light than the shadow caster. Using texture filtering on the depth map between two pixels at different depths give a depth somewhere in between. It will still only give you a yes/no answer, but it’ll be wrong because it’s using some unrelated depth value. Instead, you have to use a more complicated method.

There are vast numbers of ways to do nice soft shadowing, so I won’t go into them here apart from to mention the simplest, which is called Percentage Closer Filtering (PCF) [one thing I find is that most graphics techniques have long and complicated sounding names, but they’re usually really simple]. With PCF, instead of doing a single shadow test, you do a few tests but offset the lookup into the shadow map slightly for each one. For example, you could do four tests – one slightly left, one right, one up and one down from where you would normally sample from the shadow map. If, for example, three of them were in shadow but one wasn’t, then the shadow would be 75% dark. This gives you some amount of soft shadowing.

Basic 4-sample Percentage Closer Filtering

As you can see it doesn’t look great, but more advanced sampling and filtering can be used to give decent results, and PCF is exactly what the soft shadowing in a lot of modern games is based on.

End of part 1

That sums up my introduction to basic graphics techniques. Hopefully it made sense and/or you learnt something… I will be carrying on with a bunch of more advanced techniques that I find interesting, and hopefully manage to present those in a way that makes sense too!

The next lighting technique I want to cover is environment mapping with cube maps.

Environment mapping

Environment mapping is another form of specular lighting. When I spoke about specular lighting here, I was talking about simulating the light reflected from one bright light source. You can repeat the calculations for multiple light sources but it quickly gets expensive. For a real scene, an object will be reflecting light from every other point in the scene that is visible from it. Using the standard specular approach is obviously not feasible for this infinite number of light sources, so we can take a different approach to modelling it by using a texture. We call this environment mapping.

Cube maps

We need to use a texture to store information about all of the light hitting a point in the scene. However, textures are rectangular and can’t obviously be mapped around a sphere (which is needed to represent all the light from the front, back, sides, top, bottom etc). What we do is use six textures instead, one for each side of a cube, and we call this a cube map. When arranged in a cross shape you can see how they would fold together into a cube:

Uffizi Gallery environment map

This is a famous cube map of the Uffizi Gallery in Florence, and is a bit like a panoramic photo with six images stitched together. These six textures are actually stored separately (they’re not actually combined into one big texture), and they are labelled front, back, left, right, top and bottom as in the image.

Lighting with cube maps

Remember how the reflection vector is calculated by reflecting the view vector around the normal. This reflection vector points to what you would see if the surface was a mirror (and is therefore the direction where any specular lighting comes from). The problem is to know what light would be reflected from that direction, and that is where the cube map comes in.

An environment map is another name for a cube map that contains a full panoramic view of the world (or environment) in all directions, such as the one above. The reflection vector can be used directly to look up into the environment map. This then gives you the colour of the reflection from that point.

In a pixel shader, it’s as simple as doing a normal texture lookup apart from you use the reflection direction as the texture coordinate:

Here is an example of my test scene where the only lighting on the objects is from the nebula cube map I used here:

Only lit using an environment map

How does the GPU actually look up into the texture though? The first thing it needs to do is find which face of the cube the reflection ray is going to hit. To find this, we just need to find the longest component in the reflection vector. If the X component is longest then the vector is mainly pointing left or right (depending if it’s positive or negative) so we will use the left or right face. Similarly, if the Y component is longest then we use the top or bottom face, and if Z is longest then we use the front or back face. Let’s start with an example:

reflectionVector = { -0.84, 0.23, 0.49 }

The X component is the largest, so we want either the left or right face. It’s negative, so we want the left face.

Now the face has been determined, the vector can be used to find the actual texture coordinates in that face. In exactly the same way as a vertex is projected onto the screen, the vector is projected onto the face we’ve just found by dividing by the longest component:

{ -0.84, 0.23, 0.49 } / -0.84 = { 1.0, -0.27, -0.58 }

Take the two components that don’t point towards the face (Y and Z in this case) and map them from (-1, 1) range to (0, 1) range, as this is the range that texture coordinates are specified in:

And that is what texture coordinate will be sampled from the left face in this example.

Blurry reflections

Using a cube map like this will always give very sharp, mirror-like reflections, as sharp as the texture in the cube map. Surfaces that aren’t completely smooth should have blurred reflections instead, due to the variety of surface orientations on a small area of rough material (like I talked about here). One way of doing this would be to sample the cube map multiple times in a cone around the reflection vector and average them, which would simulate the light reflected in from different parts of the world. However, this would be very slow. Instead we can make use of mipmapping.

Mipmapping

Mipmapping is an important technique when doing any kind of graphics using textures. It involves storing a texture at multiple different sizes, so that the most appropriate size can be used when rendering. Here are the mipmaps for the left face of the texture above:

Mipmap levels for one cube face

Each successive mipmap level is half the resolution of the previous one. To make the next mipmap level, you can just average each 2×2 block of pixels from the previous level, and that gives the colour of the single pixel in the lower resolution level. What this means is that each pixel in a smaller mipmap level contains all of the colour information of all of the pixels it represents in the original image. As shown in the picture, if you blow up a smaller mipmap you get a blurry version of the original image.

Funnily enough, this is exactly what we need for blurry reflections. By changing what resolution mipmap level we sample from (and we are free to choose this in the pixel shader), we can sample from a sharp or a blurry version of the environment map. We could change this level of blurriness per-object, or even per-pixel, to get a variety of reflective surface materials.

Mipmaps in standard texturing

To finish off, I’ll give a quick explanation of why mipmapping is used with standard texturing, as I didn’t cover it earlier.

There are two main reasons why mipmaps are useful, and these are to do with aliasing and memory performance. Both of these problems are only seen if you’re drawing a textured polygon on the screen, and it’s really small. The problems in this case is that you’re only drawing a few pixels to the screen, but you’re sampling from a much larger texture.

The first problem of aliasing occurs when a sample of a texture for a pixel doesn’t include all of the colour information that should be included. For example, imagine a large black and white striped texture drawn only a few pixel wide. When sampling from the full-resolution texture each pixel must be either black or white, as those are the only colours in the texture. Each pixel should actually be drawn grey, because each pixel covers multiple texels (a texel is a pixel in a texture), which would average out to grey. Using a smaller, blurrier mipmap level would give the correct grey colour. If mipmapping isn’t used, the image will shimmer as the polygon moves (due to the where in the texture the sample happens to land).

Aliasing when a highly detailed texture is scaled down and rotated clockwise a bit

The second problem is performance. Due to the way that texture memory works, it is much more efficient to use smaller textures. Using a texture that is far too big not only looks worse due to aliasing, it’s actually a lot slower because much more memory has to be accessed to draw the polygon. Using the correct mipmap level means less memory has to be accessed so it will draw faster.

Next time I’ll conclude this part of the series by talking about basic shadowing techniques.

Last time I spoke about specular lighting, which combined with diffuse and ambient from the previous article means we now have a good enough representation of real-world lighting to get some nice images.

However, this lighting isn’t very detailed. Lighting calculations are based on the orientation of the surface, and the only surface orientation information we have is the normals specified at each vertex. This means that the lighting will always blend smoothly between the vertices, because the normals are interpolated. Drawing very detailed surfaces in this manner would require very many vertices, which are slow to draw. What would be better would be to vary the lighting across a polygon, and for this we can use normal maps.

Normal maps

This is an example of what a normal map looks like:

Example of a normal map

A normal map looks like a bluey mess, but it makes sense when you understand what the colours mean. What we’re doing with a normal map is storing the orientation of the surface (the surface normal) at each pixel. Remember that the normal is a vector that points directly away from the surface. A 3D vector is made up of three coordinates (X, Y and Z), and coincidentally our textures also have three channels (red, green and blue). What this means is that we can use each colour channel in the texture to store one of the components of the normal at each pixel.

We need the normal map to work no matter the orientation of the polygon that is using it. So if the normal map is mapped onto the floor, or a ceiling, or wrapped around a complex shape, it still has to provide useful information. Therefore it’s no use encoding the direction of the normal directly in world-space (otherwise you couldn’t reuse the map on a different bit of geometry). Instead, it is encoded in yet another ‘space’ called tangent space. This is a 3D coordinate system where two of the axes are the U and V axes that texture coordinates are specified in. The third axis is the surface normal.

How tangent space related to UV coordinates

Encoding a normal in this space is straightforward. The red channel in the texture corresponds to the distance along the U axis, the green channel is the same for the V axis, and the blue channel is the distance along the normal. The distances along U and V can go from -1 to 1 (as we’re encoding a unit vector), so a texture value of 0 represents -1, and 255 (the maximum texture value if we’re using an 8-bit texture) represents +1. Because a surface normal can never face backwards from the surface, the blue channel only needs to encode distances from 0 to 1.

Now we can understand what the colours in the normal map mean. A pixel with lots of red is facing right, and with little red is facing left. A pixel with lots of green is facing up, and with little green is facing down. Most pixels have a lot of blue, which means they’re mainly facing out along the normal (as you’d expect, as this is the average surface orientation).

Shading with normal maps

So now we have a normal map, and it’s mapped across our object in world space. We can read the texture at each pixel to give us a tangent space normal, but the lighting and view directions are specified in world space. We need to get all of these vectors into the same space, and for this we need a matrix that converts between tangent and world space. Luckily, that’s fairly easy to get.

Matrices

First a quick diversion into rotation matrices. I’ve talked about 4×4 transform matrices for transforming from one 3D space to another, but the top left 3×3 part of the matrix is all you need to perform just the rotation. Because we only want to rotate the normal we don’t need to apply any translation, so we just need a rotation matrix.

Green is the rotation part of a transform matrix. Red is the rotation part, made up of the X, Y and Z axes in the new space.

Rotation matrices between coordinate systems with three perpendicular axes (i.e. the usual ones we use in graphics) have a couple of nice properties. The first is that the columns are just the original axes but transformed by the rotation we’re trying to represent, i.e. the first column is where the X axis would be after the rotation, the second column where the Y axis would be, and the third column where the Z axis would be.

The second nice property is that the inverse of a rotation matrix is its transpose. This means that the rows represent the three axes with the inverse rotation applied. If you’re interested, this is a more in depth explanation of rotation matrices here.

Tangent to world space

So how does this help us? We need to build a rotation matrix to convert between tangent space and world space. The first thing to do is to add a couple more vertex attributes – these are the tangent and the binormal vectors. These are similar to the normal, but they define the other two axes in tangent space. Remember that these are defined by how the UV texture coordinates are mapped onto the geometry. Your modelling package should be able to export these vertex attributes for you.

Now, we need to use these to get the light, view and normal vectors into the same space. In this case we’ll transform the view and light directions into tangent space in the vertex shader (although you could instead transform the normals into world space in the pixel shader, if that makes your shader simpler).

As shown above, the tangent-to-world matrix is just the 3×3 matrix where the columns are the Tangent (X axis in the normal map), Binormal (Y axis) and Normal (Z axis), in that order. To get the world-to-tangent matrix, just transpose it so the rows are Tangent, Binormal and Normal instead:

World to tangent space matrix, made up of the tangent, binormal and normal vectors in world space

Then you can use this to transform your light and view vectors! In case it helps, here’s some vertex shader HLSL code to do all this:

In the pixel shader you read the normal map the same as any other texture. Remap the X and Y components from (0, 1) range to (-1, 1) range, and then perform lighting calculations as usual using this normal and the transformed view and light vectors.

Here’s my test scene with the normal map on the floor:

And here’s a screenshot with the nicer shading and shadows turned on:

A caveat

One last technical point – the properties of rotation matrices I talked about only hold for purely rotational matrices, between coordinate spaces where the axes are all at right angles. Due to unavoidable texture distortions when they’re mapped to models, this usually won’t be the case for your tangent space. However, it’ll be near enough that it shouldn’t cause a problem except in the most extreme cases. If you renormalise all your vectors in the vertex and pixel shaders then it should all work out alright…

Next time should be a bit simpler when I’ll be talking about environment mapping!

This is just an experiment in coming up with a more interesting ray-marched scene to test out graphics techniques. It may be developed into a proper demo at some point, but that probably requires more artistic input than I’m able to give, so we’ll see.

What we have here are two infinite length boxes, with X and Y coordinates rotated around the Z axis depending on Z. Another box is intersected at intervals to break it up a bit. Finally a texture is used to add some small-scale surface detail. So there’s nothing particularly interesting about the base shape.

Last time I talked about basic ambient and diffuse lighting. This is good for perfectly matt surfaces, but every real surface has some degree of specularity, which is when light is reflected off a surface to give a shiny appearance. You may not think of most objects as having a shiny surface, but it’s a significant part of the look of almost any surface. If you want to see for yourself, take a look at this article which shows you how to split up the light from any object into specular and diffuse components.

Specular from perfect mirrors

Light reflects off specular surfaces in exactly the same way as a mirror (because mirrors are in fact just very flat surfaces with pure specular reflection, and no diffuse lighting). If you have a point light source, you can work out the reflection direction by reflecting around the surface normal.

Mirror reflections

When looking in a mirror, you see the reflection of the light at the point where the reflected light direction goes directly into your eye. If you work backwards you can see that each point on the surface of the mirror will reflect light from a different point in the world. Mirrors are the simplest case because they are completely flat, meaning that light from a point in the world will only be reflected towards your eye from one point on the mirror.

Imperfect reflections

Most surfaces aren’t perfectly flat, and so don’t have perfect mirror reflections. If a surface isn’t perfectly flat it means that the normals around a given point will be pointing in lots of different directions. On average the normals will all point directly away from the surface, but if you look close enough at a rough surface (e.g. tarmac) you’ll see lots of surface facets pointing in different directions.

When we’re far enough away from the surface, all of these surface facets will look small enough that they occupy just one pixel on the screen, or one receptor in our eye. However, the distribution of the normals still affects how it appears to us.

Reflections from rough surfaces

What we see in the diagram is that one pixel on a rough surface actually reflects light from lots of different directions. The strongest reflections are from the ‘mirror’ reflection direction, with the reflection strength tailing off the further you get from this (because on average more of the surface facets will be pointing in the normal direction). What this means is that if the light source is at A then you’ll get a very bright reflection because lots of the light is reflected towards the eye. If the light is at B then there will still be some reflection but it will be weaker. This is why shiny surfaces have sharp, bright highlights and rough surface have blurry, dim highlights.

Implementation

To calculate the specular highlights from a light source at a given pixel, we first need to find out how much the angles all line up. We need three pieces of information at each pixel:

View vector. This is the vector from the pixel to the eye, in world space. It is calculated in the vertex shader and interpolated for each pixel.

Normal vector. This is part of the data for each vertex in the mesh, and interpolated for each pixel. Normal maps can also be used to provide more detail (I’ll talk about those another time).

Light vector. This is the vector from the light to the pixel, and for distant light sources is constant for all pixels. For near light sources this is calculated in the vertex shader, the same as for the view vector.

Now we need to find the halfway vector, which is half way between the view and the light vectors. To find this, simply work out (lightVec – viewVec) and renormalise.

Then take the dot product of the halfway vector and the normal, which will give you a value for how aligned the two are. This value will be 1 if the vectors are perfectly aligned, and zero if they are at 90 degrees. It should never be negative (as this would mean you are viewing a surface from behind it) but it’s possible due to the way things are interpolated sometimes.

The next part is to come up with some simple way of simulating the rough surfaces, in particularly how all the normals reflect light from different directions. Because the dot product is between zero and one, the simplest way is to just raise it to a power – higher powers cause the value to drop off quicker the further the halfway vector is from the normal, leading to a sharper highlight. The full specular equation is then:

Varying the specular power will change the sharpness of the highlight. Here are examples of just the specular lighting with powers of 2 and 20:

Specular power of 2

Specular power of 20

Add this to the other lighting and you start to get a more believable image:

Specular, diffuse and ambient lighting

Better specular

The problem with this method of doing specular highlights is with the simplicity of just taking a dot product and raising it to a power. This method is chosen because it looks “about right” and is very cheap to calculate. It isn’t based on anything fundamental about the way light behaves, and because of this it will never produce photorealistic images.

In a future post I’ll talk about Physically Based Rendering (PBR), which boils down to a more complicated method of doing specular highlights, but a method that is based in the real-world behaviour of lights and surfaces. It is an advanced technique and has only been widespread in games for the last few years, but it given much nicer results (and the cost of being a lot more expensive to calculate). Anyway, I’ll come back to this in a future post.

Next time I’ll talk about normal mapping, which allows much more detail to be put into lighting models.

A while back I had a quick search for soft shadowing techniques for ray-marching rendering, but somehow I didn’t manage to find anything very useful. So I just had a play around and came up with something. Here’s the video from last time again:

Then I did another search and found almost exactly the same technique described here by Iñigo Quilez, apart from his method was a bit simpler and more elegant so I’ve just gone with that. I’ll give a bit of background and explain the method anyway.

Why shadows are soft

If we want to model lighting in the real world we have to understand how it works and why, and then come up with an approximation in code. In the case of shadows, they are caused when the path from a point in the world to a light source is blocked by something else so the light can’t reach it. Which is obvious, but what the shadow looks like depends on the shape of the light source.

The simplest case is when the light is really tiny, so that it’s effectively just a point in space. In this case, each position in the world can either ‘see’ the light source, or it can’t. Each position is either fully lit or fully in shadow, so you get very sharp shadows. Real lights aren’t points though, and have a size. Therefore positions in the world can see either none of the light, all of it or just some of it. This gives ‘soft’ shadows, with a smooth transition from fully lit to fully shadowed. The larger the light, the softer the shadow.

This is a pretty bad diagram but hopefully it shows what I’m talking about. The region in full shadow is called the umbra, and the region in partial shadow is called the penumbra.

Shadows from point lights and area lights

One final thing to note from the diagram is that the size of the penumbra is wider the further away you are from the occluding object. This means you get sharper shadows close to the shadowing object, and blurrier shadows further away.

Larger penumbra further away

Quick intro to ray marching

I’d better give a brief description of how ray-marching renderers work, or the rest of this won’t make a lot of sense.

When using a ray marching renderer, you define your scene by using an equation. What this equation does is it takes a 3D point and tells you how far that point is from a surface in the scene. To render the scene you start at the eye position and for every pixel on the screen, get a direction vector that goes from the start position through that pixel. Then you iteratively step through the scene along this vector.

Marching along a ray until it hits the floor

At each step, you plug the current position into the equation and it tells you how far you are from a surface. You then know that you can safely move that distance along the vector and not hit anything. As you approach a surface the distance of each step will get smaller and smaller, and at some threshold you stop and say that you’ve hit it. There is a more detailed explanation in this great presentation.

Simulating soft shadows

We now know where soft shadows come from and how the ray-marching renderer works, so we can combine them.

The simplest shadowing method is hard shadows. For this we simply need to fire a ray from the position of the pixel we’re rendering, to the sun. If it hits anything, it’s in shadow. If it doesn’t (i.e. the distance function gets sufficiently large) then there is nothing in the way and it’s lit.

But, at each step we know how far the ray is from something in the scene. The closer the ray comes to an object, the more shadowed it’s going to be. Also, the further away from the rendered pixel the object is, the more shadowed it’s going to be (as objects further away case larger penumbra). Therefore, the amount of light at each step is proportional to:

distanceFromObject / distanceFromPixel

Furthermore, the penumbra size is proportional to the light source size, so we can extend it to:

distanceFromObject / (distanceFromPixel * sunSize)

Take the minimum of this at every step and clamp the amount of light to 1 (fully lit), and you’ll get nice soft shadows.

This shadow method doesn’t give fully correct shadows though. The reason is that it only gives the outer half of the penumbra, meaning that there is actually more shadow than there should be (i.e. light energy is lost). Hopefully this illustrates the reason:

Inner half of penumbra is fully shadowed

Rays that intersect with an object never make it to the light source, they just converge on the object itself. Therefore it is impossible for a single ray to measure the inner half of the penumbra. The central red line is from a pixel that will be drawn as fully shadowed, instead of half shadowed as it should be. However, it still looks nice and it’s a really simple shadowing method, so we can live with it being a little wrong.

In my last post I mentioned that even the simplest scene can look photo-realistic if lit well, before going on to talk about really simple lighting methods that really don’t. Obviously you have to start with the basics, but I was playing around with the code to see what could be done and I got it looking fairly nice.

Confession time – I was talking about polygonal rendering in my last post but the screenshots were actually from a ray-marching renderer (I’ll talk about it more on here one day), which is much easier to do realistic lighting with. But for the simple lighting examples the results are exactly the same.

The video show a method of doing soft shadows, where the softness correctly increases the further the pixel is from the occluder. The sun size can be adjusted to change the degree of blurring. After that there is some basic ambient occlusion (nothing fancy, just brute force to get it looking nice). Finally there is some postprocessing – tone mapping, bloom, antialiasing and lens flare.

The video looks pretty terrible. I suspect that the almost flat colours don’t compress well, hence all the banding. When it’s running there aren’t any banding issues, as you can hopefully see in the screenshot.

Lighting is what makes a 3D scene look real. You used to hear talk of the number of polyons or the resolution of the textures in a game, but these days GPUs are powerful enough that we’re rarely limited by these things in any meaningful sense. The focus now is on the lighting. The most detailed 3D model will look terrible if it’s unlit, and the simplest scene can look photorealistic if rendered with good lighting.

I’ll first cover the most basic lighting solutions that have been used in games for years, and later talk about more advanced and realistic methods. This will cover simple ambient and diffuse lighting. But first of all I’ll explain a bit of the theory about lighting in general and how real objects are lit.

A bit of light physics

Eyes and cameras capture images by sensing photons of light. Photons originate from light sources, for example the sun or a light bulb. They then travel in a straight line until they are bounced, scattered, or absorbed and re-emitted by objects in the world. Finally some of these photons end up inside the eye or the camera, where they are sensed.

Not all photons are the same – they come in different wavelengths which correspond to different colours. Our eyes have three types of colour-sensitive cells which are sensitive to photons of different wavelengths. We can see many colours because our brains combine the responses from each of the cell types, for example if both the ‘green’ and ‘red’ cells are stimulated equally we’ll see yellow.

Responses of the three types of cones in the eye to different wavelengths of light

Televisions and monitors emit light at just three wavelengths, corresponding to the red, green and blue wavelengths that our eyes are sensitive to. By emitting different ratios of each colour, almost any colour can be reproduced in the eye. This is a simplification of the real world because sunlight is made up of photons with wavelengths all across the visible spectrum, and our cones are sensitive across a range of wavelengths.

Objects look coloured because they absorb different amounts of different frequencies of light. For example, if a material absorbs most of the green and red light that hits it, but reflects or re-emits most of the blue light, it will look blue. Black materials absorbs most of the light across all frequencies. White materials absorb very little.

The rendering equation

Feel free to skip this section if you don’t get it. The rendering equation is a big complicated equation that states exactly how much light will reach your eye from any point in the scene. Accurately solving the rendering equation will give a photorealistic image. Unfortunately it’s too complicated to solve it exactly, so lighting in computer graphics is about finding better and better approximations to it. The full equation looks something like this:

Don’t worry about understanding it. What it says is that the light leaving a point towards your eye is made up of the light directly emitted by that point, added to all of the other light that hits that point and is subsequently deflected towards your eye.

If you zoom in closer and closer to any material (right down to the microscopic level), at some point it will be effectively ‘flat’. Hence we only need to consider light over a hemisphere, because light coming from underneath a surface would never reach it. Here is a slightly easier to understand picture of what the rendering equation represents:

The reason that the rendering equation is so hard to solve is because you have to consider all of the incoming light onto each point. However, all of that light is being reflected from an infinite number of other points. And all the light reflected from those points is coming from an infinite number of other points again. So it’s basically impossible to solve the rendering equation except in very special cases.

Ambient light

Hopefully you’re still with me! Let’s leave the complicated equations behind and look at how this relates to simple lighting in games.

As the rendering equation suggests, there is loads of light bouncing all over the place in the world that doesn’t appear to come from any specific direction. We call this the ambient lighting. At its simplest, lighting can be represented as a single colour. Here is my simple test scene (a box and a sphere on a flat floor plane) lit with white ambient light:

White ambient light only

To work out what the final colour of a surface on the screen is, we take the lighting colour for each channel (red, green, blue) and multiply it with the surface colour for that channel. So under white light, represented in (Red, Green, Blue) as (1, 1, 1), the final colour is just the original surface colour. So this is what my test world would look like if an equal amount of light hit every point. As you can see, it’s not very realistic. Constant ambient light on its own is a very bad approximation to world lighting, but we have to start somewhere.

As a quick aside, I mentioned earlier that we only use three wavelengths of light when simulating lighting, as a simplification of the real world where all wavelengths are present. In some cases this can cause problems. The colour of the floor plane in my world is (0, 0.5, 0), meaning the material is purely green and has no interaction with red or blue light. Now we can try lighting the scene with pure magenta light, which is (1, 0, 1), thus it contains no green light. This is what we get:

Lit with pure magenta ambient light

As you can see, the ground plane has gone completely black. This is because when multiplying the light by the surface colour, all components come out to zero. Usually this isn’t a problem as scenes are rarely lit with such lights, but it’s something to be aware of.

Directional light and normals

The first thing we can do to improve the lighting in our scene is to add a directional light. If we are outside on a sunny day then the scene is directly lit by one very bright light – the sun. In this case the ambient light is made up of sunlight that has bounced off other objects, or the sky. We know how to apply basic ambient, so we can now look at adding the directional light.

It’s time to introduce another concept in rendering, which is the normal vector. The normal vector is a direction that is perpendicular to a surface, i.e. it points directly away from it. For example, the normal vector for the floor plane is directly upwards. For the sphere, the normal is different at every position on the surface and points directly away from the centre.

So where do we get this normal vector from? Last time, I introduced the concept of vertex attributes for polygon rendering, where each vertex contains information on the position, colour and texture coordinates. Well, the normal is just another vertex attribute, and is usually calculated by whichever tool you used to make your model. Normals consist of three numbers, (x, y, z), representing a direction in space (think of it as moving from the origin to that coordinate in space). The length of a normal should always be 1, and we call such vectors unit vectors). Hence the normal vector for the floor plane is (0, 1, 0), which is length 1 and pointing directly up. Normals can then be interpolated across polygons in the same way as texture coordinates or colours, and this is needed to get smooth lighting across surface made up of multiple polygons.

We also need another vector, the light direction. This is another unit vector that points towards the light source. In the case of distant light sources such as the sun, the light vector is effectively constant across the whole scene. This keeps things simple.

Diffuse lighting

There are two ways that light can interact with an object and reach your eye. We call these diffuse and specular lighting. Diffuse lighting is when a photon is absorbed by a material, and then re-emitted in a random direction. Specular lighting is when photons are reflected off the surface of a material like a mirror. A completely matt material would only have diffuse lighting, while shiny material look shiny because they have specular lighting.

Diffuse and specular lighting

For now we will only use diffuse lighting because it is simpler. Because diffuse lighting is re-emitted is all directions it doesn’t matter where you look at a surface from, it will always look the same. With specular lighting, materials look different when viewed from different angles.

The amount of light received by a surface is affected by where the light is relative to a surface – things that face the sun are lighter than things that face away from it. This is because when an object is at an angle, the same amount of light is spread over a greater surface area (the same reason as why the earth is cold at the poles). We now have the surface normal and the light angle, so we can use these to work out the light intensity. Intuitively, a surface will be brighter the closer together the normal and the light vectors are.

The exact relation is that the intensity of the light is proportional to the cosine of the angle between them. There is a really easy and quick way to work this out, which is to use the dot product. Conveniently enough the dot product is a simple mathematical function of two vectors that happens to give you the cosine of the angle between them, which is what we want. So, give two vectors A and B, the dot product is just this:

(A.x * B.x) + (A.y * B.y) + (A.z * B.z)

To get the diffuse lighting at a pixel, take the dot product of the normal vector and the light vector. This will give a negative value if the surface is pointing away from the light, but because you don’t get negative light we clamp it to zero. Then you just add on your ambient light and multiply with the surface colour, and you can now make out the shapes of the sphere and the box:

Constant ambient and single directional light

Better ambient light

At this point I’ll bring in a really simple improvement to the ambient light. The ambient represents the light that has bounced around the scene, but it’s not all the same colour. In my test scene, the sky is blue, and therefore the light coming from the sky is also blue. Similarly, the light bouncing off the green floor is going to be green. When you can make generalisations like this (and you often can, especially with outdoor scenes), we may as well make use of this information to improve the accuracy of the ambient light.

In this case we can simply have two ambient light colours – a light blue for the sky and a green for the ground. Then we can use the surface normal to blend between these colours. A normal pointing up will receive mostly blue light from the sky, and a normal pointing down will receive mostly green light from the floor. We can then apply a blend based on the Y component of the normal. This is the scene using just the coloured ambient:

Hemispherical ambient light

And then add the diffuse lighting back on:

Hemispherical ambient and diffuse

Now just combine the lighting with some texture mapping and you’re up to the technology levels of the first generation of 3D polygonal games!

That’s more than enough for this post, so next time I’ll talk about specular lighting.

We saw in part 3 how to move the camera around a wireframe world. Now it’s time to move onto proper solid 3D rendering. For this we need to introduce a few new concepts: basic shading, vertex attributes, textures and the depth buffer.

Solid rendering

Moving from wireframe to solid rendering can be as simple as filling in between the lines. There are a load of algorithms for efficient scanline rasterisation of triangles, but these days you don’t have to worry about it because your graphics hardware deals with it – simply give it three points and tell it to draw a triangle. This is a screenshot of a solid-rendered triangle, which is pretty much the most basic thing you can draw in D3D or OpenGL:

That’s not very exciting, but that’s about all you can draw with only positional data. In case you wonder why I choose a triangle, it’s because a triangle is the simplest type of polygon and all rendering eventually breaks down to drawing triangles (for example a square will be cut in half diagonally to give two triangles).

Vertex attributes and interpolation

I’ll just clarify a bit of terminology. A vertex is a single point in space, usually used to define the corners of a shape, so a triangle has three vertices. An edge is a line between two vertices, so in wireframe drawing we just draw the edges. A polygon is the whole filled in shape, such as the triangle above.

Vertices don’t just have to contain positional information, they can have other attributes. One example of a simple attribute is a colour. If all the vertices are the same colour then the whole polygon could just be drawn the same colour, but if the vertices are different colours then the values can be interpolated between the vertices. Interpolation simply means to blend smoothly from one value to a different value – for example, exactly half way between the two vertices the colour will be an even mix of each. Because triangles have three vertices, a slightly more complex interpolation is actually used that blends smoothly between three values. Here is an example of the same triangle with coloured vertices:

Texture mapping

The other common vertex attributes are texture coordinates. A texture is just a picture, usually stored as an image file (although they can be generated algorithmically at runtime). Textures can be applied to polygons by ‘stretching the picture’ across the polygon. You can think of a 2D picture as having coordinates like on a graph – an X coordinate running horizontally, and a Y coordinate running vertically. These coordinates range from 0 to 1 across the image, and the X and Y coordinates are usually called U and V in the case of textures.

Textures are applied to polygons by specifying a U and V coordinate at each vertex. These coordinates (together referred to as UVs) are interpolated across the polygon when it is drawn, and instead of directly drawing a colour for each pixel, the coordinates are used to specify a point in the texture to read from. The colour of the texture at the point is drawn instead. This has the effect of stretching some part of the texture across the polygon that is being drawn.

As an example, here is a texture and a screenshot of part of it mapped onto our triangle:

This is just a really quick introduction to the basics of polygon rendering. There are a lot more clever and interesting things that can be done which I will talk about in later sections.

A question of depth

The problem with rendering solid geometry is that often two pieces of geometry will overlap on the screen, and one will be in front of the other. The two options for dealing with this are either to make sure that you draw everything in back-to-front order, or to keep track of the depth of each pixel that you’ve rendered so that you don’t draw more distant objects in front of closer ones.

Rendering back to front will give the correct screen output but is more expensive – a lot of rendering work will be done to draw objects that will later be drawn over (called overdraw), and additional work is required to sort the geometry in the first place. For this reason it is more efficient in almost all cases to use a depth buffer.

Screen buffers

First lets talk about how a computer stores data during rendering. It has a number of buffers (areas of memory) which store information for each pixel on the screen. It renders into these buffers before copying the contents to the display. The size of the buffers depends on the rendering resolution and the quality. The resolution is how many pixels you want to draw, and as an example if you want to ouput a 720p picture you need to render 1280×720 pixels.

The colour buffer stores the colour of each pixel. For a colour image you need at least one byte of storage (which can represent 256 intensity levels) for each of the three colour channels, red, green and blue. This gives a total of 256x256x256 = 16.7 million colours, and so each pixel required three bytes of storage (but due to the way memory is organised there is a fourth spare byte for each pixel, which can be used for other things). These days a lot of rendering techniques require higher precision but I’ll be writing more about this later on.

The second type of buffer is the depth buffer. This is the same resolution as the colour buffer but instead stores the distance of the pixel along the camera’s Z axis. This is often stored as a 24-bit floating point number, meaning it can represent basically any number (to varying degrees of accuracy). Here is an example courtesy of Wikipedia of the colour and depth buffers for a scene, where you can see that pixels in the distance have a lighter colour in the depth buffer, meaning that they are a larger value:

Depth buffer rendering

Using a depth buffer to render is conceptually very simple. At the beginning of the frame, every pixel in the buffer is reset to the most distant value. For every pixel of every polygon that is about to be rendered to the screen, the depth of that pixel is compared with the depth value stored for that pixel in the depth buffer. If it’s closer, the pixel is drawn and the depth value updated. Otherwise, the pixel is skipped. If a pixel is skipped then no calculations have to be done for that pixel, for example doing lighting calculations or reading textures from memory. Therefore it’s more efficient to render object from front to back so that objects behind closer ones don’t have to be drawn.

That’s all that needs to be said about depth buffers for now. Next time I’ll be talking about lighting.

[Warning: this is a slightly more technical article containing talk of matrices, transforms and spaces. If it doesn’t make sense just carry on to the next article, you don’t need to understand the details!]

We saw in the previous article how to draw a point in 3D when the camera is at the origin and looking down the Z axis. Most of the time though this won’t be the case as you’ll want to move the camera around. It’s really hard to directly work out the projection of a point onto an arbitrarily positioned camera, but we can solve this by using transforms to get us back to the easy case.

Transform matrices

A transform in computer graphics is represented by a 4×4 matrix. If you’re not familiar with matrix maths don’t worry – the important things to know are that you can multiply a 4×4 matrix with a position to get a different position, and you can multiply two 4×4 matrices together to get a new 4×4 matrix. A transform matrix can perform translations, rotations, scales, projections, shears, or any combination of those.

Here is a simple example of why this is useful. This is the scene we want to draw:

The camera is at position Z = 2, X = 1, and is rotated 40 degrees (approximately, again exact numbers don’t matter for the example) around the Y axis compared to where it was in the last example. What we can do here is find a transform that will move the camera from its original position (at the origin, facing down the Z axis), to where it is now.

First we look at the rotation, so we need a matrix describing a 40° rotation around the Y axis. After that we need to apply a translation of (2, 0, 1) in (x, y, z) to move the camera from the origin to its final location. Then we can multiply the matrices together to get the camera transform.

If you’re interested, this is what a Y rotation matrix and and a translation matrix looks like:

This camera transform describes how the camera has been moved and rotated away from the origin, but on its own this doesn’t help us. But think about if you were to move a camera ten metres to the right – this is exactly the same as if the camera had stayed where it was and the whole world was moved ten metres to the left. The same goes for rotations, scales and any other transform.

We can take the inverse of the camera matrix – this gives us a transform that will do the opposite of the original matrix. More precisely, if you transform a point by a matrix and then transform the result by the inverse transform the point will end up back where it started. So while the camera transform can be used to move the camera from the origin to its current position, applying the inverse (called the view transform) to every position in the world will move those points to be relative to the camera. The ‘view’ of the world through the camera will look the same, but the camera will still be at the origin, looking down the Z axis:

This diagram may look familiar, and in fact is the exact same setup that was used for rendering in the previous article. So we can now use the same rendering projection method!

World, camera, object and screen spaces

Using transforms like this enables the concept of spaces. Each ‘space’ is a frame of reference with a different thing at the origin. We met world space last time, which is where the origin is some arbitrary position in the world, the Y axis is up and everything is positioned in the world relative to this.

We’ve just met camera space – this is the frame of reference where the origin is defined as the position of the camera, the Z axis is straight forwards from the camera and the X and Y axes are left/right and up/down from the camera’s point of view. The view transform will convert a point from world space to camera space.

You may also come across object space. Imagine you’re building a model of a tree in a modelling package – you’ll work in object space, probably with the origin on ground level and at the bottom. When you’ve build your model you’ll want to position it in the world, and for this we use the basis matrix, which is a transform (exactly the same as how the camera was positioned above) that will convert the points you’ve modelled in object space into world space, thus defining where the tree is in the world. To render the tree, the points will then be further converted to camera space. This means that the same object can be rendered multiple times in different places just by using a different basis matrix.

Finally, as we saw in the previous article, we can convert to screen space by applying the projection matrix.

Using this in a renderer

All of these transforms are represented exactly the same, as 4×4 matrices. Because matrices can be multiplied together to make new matrices, all of these transforms can be combined into one. Indeed, a rendering pipeline will often just use one transform to take each point in a model and project it directly to the final position on the screen. In general, each point in eac object is rendered using this sequence of transforms:

These days I’m mainly working with tablet devices including the various flavours of iPad. A year ago there was just the iPad 1 and the iPad 2, but now the 3 and 4 also exist, as well as the mini. Here is a summary:

iPad 1 – Very limited performance and memory. The GPU appears to be around 5-6 times slower than the iPad 2, with the same resolution. Getting a decent frame rate with any kind of non-trivial graphics is challenging, and you can most likely forget about 60Hz.

iPad 2 – This is a very balanced system. Much quicker than the iPad 1, it’s pretty easy to achieve silky-smooth 60Hz with a decent amount of overdraw and alpha blending.

iPad Mini – Almost exactly the same performance as the iPad 2 and the same resolution, just a bit smaller. This device adds no more complications which is nice.

iPad 3 – This is a pr0blematic device. Double the GPU power of the iPad 2, but four times the pixels on to the retina display. Due to everything being alpha blending in our app, the profiler shows the GPU cost to be double that of the iPad 2. This is our limiting device.

iPad 4 – Double the CPU and GPU again. This brings the GPU performance back up to the level of the iPad 2.

The iPad 3 is a bit of a blip. The tiled rendering architecture means that an app predominantly rendered with layers of alpha-blended geomety can expect a 2X performance penalty on the 3. The 3 is actually the limiting system, more so that the 2.

[Aside on the tiled deferred shading if you’re not familiar – The screen is internally divided into a grid, with each grid block made up of some number of actual pixels. For each block, during rendering a list is kept of all geometry that intersects it. At the end of the frame, visibility calculations are done on each pixel, and then only the geometry that contributes to the pixel colour is shaded. Deferring all pixel shading until the end of the frame means that when drawing solid geometry, each pixel only has to be shaded once at the end of the frame, instead of being shaded and then overwritten as more geometry is drawn. However, with alpha blending, all layers contribute to the final pixel colour so all geometry has to be shaded. This negates all performance gains from the tiled deferred shading, hence why alpha blending is bad on iPads and iPhones. Unfortunately, to make nice looking UIs you can’t really get away from it.]

When we want to draw something in 3D on a screen, what we’re really doing is trying to draw a flat picture of it as it would look on a film, or projected onto the retina in your eye. So we have an “eye” position, which is the point from which the scene is viewed, and we have the projection plane, which is the “film” of the camera.

When you take an image with a camera the image is projected reversed onto the film, because the projection plane is behind the lens (you can see this by looking at the path the light takes through the lens, in red). When rendering, it’s conceptually simpler to think of the projection plane as being in front of the eye. It’s also easier to see what we mean by a projection – we can think of the projection plane as being a window in front of the eye, through which we see the world. Each pixel wants to be drawn the colour of the light that passes through the corresponding part of the “window”, to reach the eye.

A basic property of a camera is the Field Of View (FOV). Cameras don’t capture the scene in a full 360 degrees, and the FOV is the angle which it can see, which is the angle between the red lines in the diagram. Continuing the window analogy, there are two ways to change the field of view: you can make the window bigger or smaller, or you can stand closer or further away from the window. Both of these will alter how much of the world you can see on the other side.

The most basic concept in 3D is perspective. It’s so simple that it’s been explained by Father Ted. Perspective just means that the further away things are, the smaller they look. Further to that, the size reduction is proportional to the distance. What this means is that something twice as far away will look half as big (specifically, half as big when you measure a length, the area will be a quarter of the size). So if you want to work out how big something will be on the screen, you divide the size by the distance from the eye position.

To start rendering in 3D we just need to know a few numbers that define the “camera view” that will be used to draw the scene. These are the size of the projection plane, and the distance it is from the eye (the projection plane is always some small distance in front of the eye to avoid nastiness later on with divide-by-zero and things).

In the diagram, take one grid square to be 1 unit in size. It makes no difference what the units are, as long as you’re consistent. For simplicity let’s work in metres. So in this diagram we can see the two pieces of information we need. The distance from the camera to the projection plane (called the camera near distance) is 1 metre, and the size of the projection plane is around 1.5 metres (specific numbers don’t matter at this point). You can see the field of view that this arrangement gives in red. In this diagram we want to draw the blue triangle, so we need to know where the three corner vertices will projected to on the projection plane.

Positions in 3D space are given using three coordinates, x, y, and z. These specify the distance along the x axis, y axis and z axis respectively, where the three axes are perpendicular to each other. There are various different coordinate spaces used in rendering, where coordinate space means the orientation of these three axes. For example, world space is where things are in you ‘world’, i.e. the scene that you are rendering, so there is the origin (0, 0, 0) at some fixed point in the world and everything is positioned relative to that. In this case x and z specify the horizontal position and y specifies the height.

The coordinate space we’re interested in at the moment though is camera space. In camera space, X is the distance left or right in your window, y is the distance up or down, and z is the distance forwards and backwards, i.e. into or out of the window. The origin is at the eye position and the camera traditionally looks along the negative z axis, so in the diagram the z axis will point to the right. The diagram is 2D so only shows one of the other axes, so we’ll ignore the third one for now.

We can now do a bit of simple maths to work out where to draw one of the vertices, the one marked with a dot. The approximate position of the vertex is (1.0, -5.2), by counting the squares in each axis (yes, this is the other way around from your traditional axes on a graph, but that just reinforces the point about different coordinate spaces). So to project this on the screen we simply divide by Z to find the point that the green line intersects the line where Z=-1. This give X = 1.0/-5.2 = -0.192.

Now we need to convert this to screen space, which is as shown is this diagram:

This is where we use the size of the projection plane, and the distance it is from the eye, to find a scaling factor. We said that the projection plane was 1.5m is total, so is 0.75 metres from the centre to each side, and is at -1.0 metres from the eye along the z axis. So the scaling factor is -1.0/0.75 = -1.333.

Now we can combine these to find where on the screen the vertex should be drawn:

X = -0.192*-1.333 = 0.256

There is one final transform that needs to be done, to work out the actual pixel coordinates on the screen. To do this we simply map the -1.0 to 1.0 range of the screen space into the viewport, which is defined in pixels. So if you’re drawing to a widescreen TV the viewport would be 1280×768 pixels in size, so the actual x pixel coordinate of the example would be:

((1.0 + 0.256) * 0.5) * 1280 = 804

Then simply do the same again with the Y axis and you drawn a 3D point! Do this with the other two points as well, and then draw straight lines between them all, and you’ve got a 3D triangle!

Or more precisely – “Anatomy of a modern realtime photorealistic 3D DX11 renderer, in layman’s terms”.

Modern 3D graphics and rendering techniques tend to be viewed as really complicated, specialist and difficult by those not involved in it. There is an aura of “magic” around how computers can produce the images shown on the screen, and practically zero understanding of how this works. I’m not even referring to just the general public (although it is certainly “magic” in this case) – even among programmers in other areas, and even a lot of games artists, there is a perception that the renderer is too complicated to understand.

So, in this series I aim to change that. I will try to explain a bit about all of the processes going on behind the scenes, and show in rough terms how they work, in non-technical language. If you’re reading this series and it’s too hard to understand, let me know and I’ll see if I can improve it!

The first part of the series will give a general overview of basic 3D graphics, of how to get anything drawing so it looks 3D on the screen. This requires some knowledge of perspective and camera transforms but I’ll keep it simple! That will take you up to the state of the art of realtime computer graphics circa 1984, which a few of you may remember:

The next jump up was full polygon-based rendering, enabled by these new-fangled graphics card things. This approach is still what almost all game engines are based on, so the second part will give an overview of basic polygon rendering. This is the state of the art in 1996:

After that we have all the really interesting stuff! There are loads of cool and interesting techniques involved in taking us from Quake in 1996 to Battlefield 3, which is a pretty good representation of the state of the art in 2011:

These cool techniques include things like high dynamic range, Bokeh depth of field, physically-based lighting models, antialiasing, tone mapping, bloom, and a whole host of other things, all designed to simulate a real camera in the real world, thus giving us a believable image. This will be the bulk of the series as it’s where all the interesting things are happening these days.

So that’s my intent. This may be a fairly long-term project but I want to show that modern computer graphics doesn’t have to be hard or obscure, and really anyone can understand it! Until next time…