These days I’m mainly working with tablet devices including the various flavours of iPad. A year ago there was just the iPad 1 and the iPad 2, but now the 3 and 4 also exist, as well as the mini. Here is a summary:

iPad 1 – Very limited performance and memory. The GPU appears to be around 5-6 times slower than the iPad 2, with the same resolution. Getting a decent frame rate with any kind of non-trivial graphics is challenging, and you can most likely forget about 60Hz.

iPad 2 – This is a very balanced system. Much quicker than the iPad 1, it’s pretty easy to achieve silky-smooth 60Hz with a decent amount of overdraw and alpha blending.

iPad Mini – Almost exactly the same performance as the iPad 2 and the same resolution, just a bit smaller. This device adds no more complications which is nice.

iPad 3 – This is a pr0blematic device. Double the GPU power of the iPad 2, but four times the pixels on to the retina display. Due to everything being alpha blending in our app, the profiler shows the GPU cost to be double that of the iPad 2. This is our limiting device.

iPad 4 – Double the CPU and GPU again. This brings the GPU performance back up to the level of the iPad 2.

The iPad 3 is a bit of a blip. The tiled rendering architecture means that an app predominantly rendered with layers of alpha-blended geomety can expect a 2X performance penalty on the 3. The 3 is actually the limiting system, more so that the 2.

[Aside on the tiled deferred shading if you’re not familiar – The screen is internally divided into a grid, with each grid block made up of some number of actual pixels. For each block, during rendering a list is kept of all geometry that intersects it. At the end of the frame, visibility calculations are done on each pixel, and then only the geometry that contributes to the pixel colour is shaded. Deferring all pixel shading until the end of the frame means that when drawing solid geometry, each pixel only has to be shaded once at the end of the frame, instead of being shaded and then overwritten as more geometry is drawn. However, with alpha blending, all layers contribute to the final pixel colour so all geometry has to be shaded. This negates all performance gains from the tiled deferred shading, hence why alpha blending is bad on iPads and iPhones. Unfortunately, to make nice looking UIs you can’t really get away from it.]

When we want to draw something in 3D on a screen, what we’re really doing is trying to draw a flat picture of it as it would look on a film, or projected onto the retina in your eye. So we have an “eye” position, which is the point from which the scene is viewed, and we have the projection plane, which is the “film” of the camera.

When you take an image with a camera the image is projected reversed onto the film, because the projection plane is behind the lens (you can see this by looking at the path the light takes through the lens, in red). When rendering, it’s conceptually simpler to think of the projection plane as being in front of the eye. It’s also easier to see what we mean by a projection – we can think of the projection plane as being a window in front of the eye, through which we see the world. Each pixel wants to be drawn the colour of the light that passes through the corresponding part of the “window”, to reach the eye.

A basic property of a camera is the Field Of View (FOV). Cameras don’t capture the scene in a full 360 degrees, and the FOV is the angle which it can see, which is the angle between the red lines in the diagram. Continuing the window analogy, there are two ways to change the field of view: you can make the window bigger or smaller, or you can stand closer or further away from the window. Both of these will alter how much of the world you can see on the other side.

The most basic concept in 3D is perspective. It’s so simple that it’s been explained by Father Ted. Perspective just means that the further away things are, the smaller they look. Further to that, the size reduction is proportional to the distance. What this means is that something twice as far away will look half as big (specifically, half as big when you measure a length, the area will be a quarter of the size). So if you want to work out how big something will be on the screen, you divide the size by the distance from the eye position.

To start rendering in 3D we just need to know a few numbers that define the “camera view” that will be used to draw the scene. These are the size of the projection plane, and the distance it is from the eye (the projection plane is always some small distance in front of the eye to avoid nastiness later on with divide-by-zero and things).

In the diagram, take one grid square to be 1 unit in size. It makes no difference what the units are, as long as you’re consistent. For simplicity let’s work in metres. So in this diagram we can see the two pieces of information we need. The distance from the camera to the projection plane (called the camera near distance) is 1 metre, and the size of the projection plane is around 1.5 metres (specific numbers don’t matter at this point). You can see the field of view that this arrangement gives in red. In this diagram we want to draw the blue triangle, so we need to know where the three corner vertices will projected to on the projection plane.

Positions in 3D space are given using three coordinates, x, y, and z. These specify the distance along the x axis, y axis and z axis respectively, where the three axes are perpendicular to each other. There are various different coordinate spaces used in rendering, where coordinate space means the orientation of these three axes. For example, world space is where things are in you ‘world’, i.e. the scene that you are rendering, so there is the origin (0, 0, 0) at some fixed point in the world and everything is positioned relative to that. In this case x and z specify the horizontal position and y specifies the height.

The coordinate space we’re interested in at the moment though is camera space. In camera space, X is the distance left or right in your window, y is the distance up or down, and z is the distance forwards and backwards, i.e. into or out of the window. The origin is at the eye position and the camera traditionally looks along the negative z axis, so in the diagram the z axis will point to the right. The diagram is 2D so only shows one of the other axes, so we’ll ignore the third one for now.

We can now do a bit of simple maths to work out where to draw one of the vertices, the one marked with a dot. The approximate position of the vertex is (1.0, -5.2), by counting the squares in each axis (yes, this is the other way around from your traditional axes on a graph, but that just reinforces the point about different coordinate spaces). So to project this on the screen we simply divide by Z to find the point that the green line intersects the line where Z=-1. This give X = 1.0/-5.2 = -0.192.

Now we need to convert this to screen space, which is as shown is this diagram:

This is where we use the size of the projection plane, and the distance it is from the eye, to find a scaling factor. We said that the projection plane was 1.5m is total, so is 0.75 metres from the centre to each side, and is at -1.0 metres from the eye along the z axis. So the scaling factor is -1.0/0.75 = -1.333.

Now we can combine these to find where on the screen the vertex should be drawn:

X = -0.192*-1.333 = 0.256

There is one final transform that needs to be done, to work out the actual pixel coordinates on the screen. To do this we simply map the -1.0 to 1.0 range of the screen space into the viewport, which is defined in pixels. So if you’re drawing to a widescreen TV the viewport would be 1280×768 pixels in size, so the actual x pixel coordinate of the example would be:

((1.0 + 0.256) * 0.5) * 1280 = 804

Then simply do the same again with the Y axis and you drawn a 3D point! Do this with the other two points as well, and then draw straight lines between them all, and you’ve got a 3D triangle!

Or more precisely – “Anatomy of a modern realtime photorealistic 3D DX11 renderer, in layman’s terms”.

Modern 3D graphics and rendering techniques tend to be viewed as really complicated, specialist and difficult by those not involved in it. There is an aura of “magic” around how computers can produce the images shown on the screen, and practically zero understanding of how this works. I’m not even referring to just the general public (although it is certainly “magic” in this case) – even among programmers in other areas, and even a lot of games artists, there is a perception that the renderer is too complicated to understand.

So, in this series I aim to change that. I will try to explain a bit about all of the processes going on behind the scenes, and show in rough terms how they work, in non-technical language. If you’re reading this series and it’s too hard to understand, let me know and I’ll see if I can improve it!

The first part of the series will give a general overview of basic 3D graphics, of how to get anything drawing so it looks 3D on the screen. This requires some knowledge of perspective and camera transforms but I’ll keep it simple! That will take you up to the state of the art of realtime computer graphics circa 1984, which a few of you may remember:

The next jump up was full polygon-based rendering, enabled by these new-fangled graphics card things. This approach is still what almost all game engines are based on, so the second part will give an overview of basic polygon rendering. This is the state of the art in 1996:

After that we have all the really interesting stuff! There are loads of cool and interesting techniques involved in taking us from Quake in 1996 to Battlefield 3, which is a pretty good representation of the state of the art in 2011:

These cool techniques include things like high dynamic range, Bokeh depth of field, physically-based lighting models, antialiasing, tone mapping, bloom, and a whole host of other things, all designed to simulate a real camera in the real world, thus giving us a believable image. This will be the bulk of the series as it’s where all the interesting things are happening these days.

So that’s my intent. This may be a fairly long-term project but I want to show that modern computer graphics doesn’t have to be hard or obscure, and really anyone can understand it! Until next time…