Not just beautiful, though - the stars are like the trees in the forest, alive and breathing. And they're watching me.

--- Haruki Murakami, Kafka on the Shore

It is time to add some more colours to our game. In this tutorial we will learn how to add a colour to each vertex and how to change the shaders to work with colours as well. To illustrate the new concepts, a colourful starfield will be rendered.

Input Element Description

So far the vertex structure only specified the position of the vertices. To add a colour to each vertex, three more floats must be added to represent the red, green and blue amount of the desired vertex colour:

There is one important little detail here that has to be addressed: The crux is the fifth member, the AlignedByteOffset, the offset (in bytes) between each element in the input description. DirectX needs to know how many bytes into the structure, from the beginning, a specific element begins. The COLOR element does not start at the beginning, but $12 = 3\cdot4$ (a float takes 4 bytes) bytes after the beginning.

We could use the D3D11_APPEND_ALIGNED_ELEMENT flag for convenience to define the current element directly after the previous one, including any packing if necessary. But for now, calculating three times four isn't too difficult yet.

High Level Shading Language

Now that the vertices have a position and a colour attributed to them, the shaders Visual Studio created no longer do the job. The problem is that the vertex shader is completely oblivious to the colour data and only outputs the position of the vertex. So we have to make three changes:

Tell the vertex shader about the position and the colour of each vertex.

Output both the position and the colour.

Tell the pixel shader to return the desired colour of the pixel.

None of this is really difficult, only step 2 needs some explanation.

Colour Input

To tell the vertex shader about the colour of a vertex, a minor change is sufficient:

Adding a simple parameter with the COLOR semantic is enough to let the vertex shader know the colour of each vertex.

Multiple Values Output

We still have to deal with the fact that the vertex shader outputs only the position of each vertex, but not its colour. To render each pixel with the appropriate colour, however, the vertex shader must pass the colour data to the pixel shader.

To return multiple values, a structure can be used. Thankfully structures in HLSL work just the same as in C:

The new vertex shader still does not do very much, but it does pass the position and colour of each vertex to the pixel shader. If you have been paying close attention, you should have noticed that the semantic of the function has been removed, but in return, to replace it, each member of the VertexOut struct has a semantic attached to itself.

Pixel Shader

The new pixel shader simply outputs the colour of each pixel. It is important to note that when passing variables between shaders, to have to always be in the correct order, that is, in our example, the output of the vertex shader must be in the same order as the input of the pixel shaders, i.e. first the position, than the colour.

A Colourful Starfield

The nitrogen in our DNA, the calcium in our teeth, the iron in our blood, the carbon in our apple pies were made in the interiors of collapsing stars. We are made of starstuff.

--- Carl Sagan, Cosmos

Let us create an array of vertices to generate a beautiful coloured starfield. The code do that is quite easy:

I wrote little helper functions to create a random float between 0.0f and 1.0f for the colour and a random float between -1.0f and 1.0f for the position. And yes, I know that those numbers aren't truly random, but this will do for now.

The astute reader may have noticed that two flags of the buffer description have been changed: Usage and CPUAccessFlags. To witness the birth and death of stars and galaxies, the data inside the vertex buffer must be changed from time to time.

Dynamic Buffers

To tell Direct3D that the data inside a buffer will eventually be changed, the vertex buffer must be created with the Usage flag set to D3D11_USAGE_DYNAMIC. This flag makes the buffer accessible by both the GPU (read only) and the CPU (write only). To reflect that change, we also set the CPUAccessFlag to D3D11_CPU_ACCESS_WRITE, which tells DirectX that the buffer is to be mappable such that the CPU can change its contents.

UINT Subresource

The third parameter specifies the CPU's read and write permissions for the resource. In this tutorial we will set this to D3D11_MAP_WRITE_DISCARD, which means that the resource will be mapped for writing and all of the previous contents of the resource will be undefined, i.e. the hardware will discard the entire buffer and return a pointer to a newly allocated buffer.

The fourth parameter is an flag that specifies what the CPU does when the GPU is busy. This flag is optional and we will set it to zero for now.

D3D11_MAPPED_SUBRESOURCE *pMappedResource

When the function returns, the fifth parameter contains the pointer to the newly mapped D3D11_MAPPED_SUBRESOURCE structure for the mapped subresource. From this moment on, any changes to the subresource will be reflected in the vertex buffer.

To unmap a subresource, a call to ID3D11DeviceContext::Unmap, which invalidates the pointer to a resource and re-enables the GPU's access to that resource, is sufficient.

In this tutorial we will finally learn how to render triangles. We won't go into every detail yet, but I thought it was about time that we saw our hard work pay off.

Rendering a triangle is a six steps process:

Create three vertices to define a triangle.

Store these vertices in video memory.

Tell the GPU how to read these vertices.

Tell the GPU how to translate those vertices into a flat image.

Tell the GPU where on the back buffer we want the image to appear.

Render the triangle.

None of these steps are very difficult, but a substantial amount of theory is required to really understand what is happening. Don't worry though, we won't delve too deeply into the theoretical stuff just yet. In this tutorial we simply want to see something rendered on the screen.

Creating a triangle

Obviously, to define a triangle, three points, or vectors $v_1, v_2, v_3 \in \mathbb{R}^3$ are needed. Those points are often called the vertices of a triangle. In DirectX, a vertex is a simple C++-structure containing all the data needed to create whatever 3D image we might want to create.

Here is a trivial example for a vertex structure, holding only information about the position of the vertex:

GPU Memory

Obviously the newly created C++-structure is stored in system memory. To not suffer from an extreme slowdown, it is at the utmost importance, however, that the data be stored in video memory once needed by the GPU.

In order to get access to the video memory, Direct3D provides a specific COM object to maintain buffers both in system and video memory. This COM object is called a vertex buffer, the COM interface is ID3D11Buffer. When rendering calls for specific data, Direct3D will automatically copy it over to the GPU and take care of all the details for us. If the video card becomes low on memory, Direct3D will delete buffers that haven't been used in a while, or are considered low priority, in order to make room for newer resources. Once again, thank you, Microsoft!

To create the vertex buffer, a description structure, the D3D11_BUFFER_DESC structure, must be filled out:

UINT ByteWidth

This parameter identifies how the buffer is expected to be read from and written to. The most common value is typically D3D11_USAGE_DEFAULT, which tells Direct3D that the GPU, and only the GPU, will read from and write to the buffer.

This member identifies how the buffer will be bound to the graphics pipeline (there will be a chapter about the theory behind the graphics or rendering pipeline later on). Obviously, to create a vertex buffer, we use the D3D11_BIND_VERTEX_BUFFER flag.

UINT StructureByteStride

This parameter defines the size of each element in the buffer structure (in bytes) when the buffer represents a structured buffer. Since we are not interested in using structured buffers at the moment, we can simply set this to zero as well.

Now that the vertex buffer is well defined, it can be given data to be initialized with using the D3D11_SUBRESOURCE_DATA structure:

Shaders

To actually render vertices, or pixels to the screen, DirectX uses shaders. The process of rendering is controlled by the (graphics) rendering pipeline. It is a series of steps which take vertices as input and result in a fully rendered image. The pipeline must be programmed by the above mentioned shaders. A shader is actually a small program that controls one step of the pipeline. Here is an image of a shader 5.0 pipeline:

As we can see, there are several different types of shaders, and each one is run many times during rendering. The vertex shader, for example, is run once for each vertex rendered, while a pixel shader is a program that is run for each pixel drawn. We won't get into any details yet, but we obviously do have to write a simple vertex and pixel shader. This will take four relatively easy steps:

Create seperate vertex and pixel shader files.

Load the two shaders from a .cso file into the program.

Encapsulate both shaders into shader objects.

Set both shaders to be the active shaders.

Creating shaders

Shaders are written in High Level Shader Language, or HLSL, for short. Visual Studio is capable of creating HLSL programs. Fortunately for us, the pixel and vertex shaders that Visual Studio creates automatically (right-click -> new file -> HLSL -> vertex/pixel shader) are good enough for the purpose of this tutorial; only very small changes are needed. Later tutorials will cover more details about HLSL.

As input, the vertex shader takes the position of a vertex, defined by three floats, just as defined by our VERTEX structure. For now, the shader does nothing but to return the input position in homogeneous coordinates (defined by the SV_POSITIONsemantic). There will be more theoretical tutorials later on, including one tutorial about projective geometry and homogeneous coordinates, but for now we will focus on getting that triangle on the screen.

Thus, without further ado, here is the pixel shader:

float4 main() : SV_TARGET
{
return float4(1.0f, 1.0f, 0.0f, 1.0f);
}

The SV_TARGETsemantic indicates that the return value of the pixel shader should match the render target format. For now, we simply set the colour of each pixel to yellow.

Loading shader files

When compiling the project, the HLSL files automatically compile as well and are transformed into Compiled Shader Objects (CSO), which the running program must load in to be able to use.

That is all about shaders that we need to know for now, but we will speak a lot more about shaders in upcoming tutorials.

Input Layouts

To tell the GPU how to handle the vertices that we defined in the VERTEX structure, Direct3D uses a technique called input layouts. An input layout is an interface to a COM object (what a surprise!) that holds a definition of how to feed vertex data that is laid out in memory into the input-assembler stage of the graphics rendering pipeline (yes, we will talk about all of that, in detail, in a later tutorial).

Anyway, an input layout is defined by a D3D11_INPUT_ELEMENT_DESC. Basically, by filling out an input layout description, the GPU is being taught how to read a custom vertex structure. It is possible to select what information gets stored with each vertex in order to improve the rendering speed by helping the GPU to organize the data appropriately and efficiently.

By now we are experts at filling out those descriptions, so here is the definition:

LPCSTR SemanticName

As seen above, a semantic) is a string which tells the GPU what a certain value is used for. There are numerous semantics possible, but for now we only need to know that POSITION represents a three-dimensional position using float values, which is what we want. Semantics are used to map elements in the vertex structure to elements in the vertex shader input signature, which we will discuss soon.

UINT SemanticIndex

The semantic index for the element. A semantic index modifies a semantic with an integer index number. A semantic index is only needed when there are multiple elements with the same semantic. A semantic name without an index defaults to index 0, for example, POSITION without an index is equivalent to POSITION0. We won't use this, as we only have one input element, but we must keep this in mind for later.

We have come across this one before: a DXGI_FORMAT specifies the data format of the vertices. We will use DXGI_FORMAT_R32G32B32_FLOAT, that is, 32 bit for each x,y and z value.

UINT InputSlot

An integer value that identifies the input-assembler, or input slot, that this vertex element will come from. Direct3D supports sixteen input slots (0-15) through which we can feed vertex data to the GPU. For now, as we only have one input data, we will also only use one input slot.

UINT AlignedByteOffset

The optional offset (in bytes) between each element in the structure. We can use D3D11_APPEND_ALIGNED_ELEMENT for convenience to define the current element directly after the previous one, including any packing if necessary. We will come back to this, later, when we add colour to our vertices.

The first parameter holds the address of the input element description array and the second parameter is the number of elements in the specified input element description array.

The third and fourth parameters are the data and length of the vertex shader files. When CreateInputLayout is called, Direct3D will check the input layout against the vertex shader to make sure they match up correctly.

The final parameter holds the address of the returned input layout interface.

Just like with shader objects and render targets, once created, the input layout object must be activated using the ID3D11DeviceContext::IASetInputLayout method, which only takes one parameter, the input layout to set.

Drawing Primitives

Ok, that was a lot of work and a lot of information to process, but we have created the triangle, loaded the triangle to the GPU memory, written, compilted, read and activated the shaders and created the input layout. Now it is time to actually draw the triangle!

We are three steps away from drawing the triangle:

Set which vertex buffer we intend to use.

Set the primitive type.

Draw the triangle!

Setting the Vertex Buffer

Setting the vertex buffer is done by calling the ID3D11DeviceContext::IASetVertexBuffers method, which binds an array of vertex buffers to the input-assembler stage. This will tell the GPU which vertices to read from when rendering. It has a couple of easy parameters, so let us look at the protoype:

UINT StartSlot

The first parameter defines the first input slot for binding. This is advanced and not needed at the moment, thus we set it to 0 for now.

UINT NumBuffers

The second parameter is the number of vertex buffers in the array. We only have one buffer, so we set this to 1.

ID3D11Buffer *const *ppVertexBuffer

The third parameter is a pointer to a constant array of vertex buffers. The vertex buffers must have been created with the D3D11_BIND_VERTEX_BUFFER flag. As we only have one vertex buffer, we can simply use vertexbuffer.GetAddressOf().

const UINT *pStrides,

The fourth parameter is a constant array of stride values; one stride value for each buffer in the vertex-buffer array. Each stride is the size (in bytes) of the elements that are to be used from that vertex buffer. To fill this parameter we create a UINT, fill it with the size of our vertex structure and put the address of that variable into this parameter.

const UINT *pOffsets

The fifth parameter is a constant array of offset values; one offset value for each buffer in the vertex-buffer array. Each offset is the number of bytes between the first element of a vertex buffer and the first element that will be used. This will usually be 0.

Direct3D has no idea about the mathematical conventions we use, thus we have to tell Direct3D what exactly we mean when we define vectors in $\mathbb{R}^3$. The pipeline must know how to interpret the vertex data that is bound to the input-assembler stage. The primitive topology determines how the vertex data is rendered on screen. We will cover this later in a more theoretical tutorial, for now, to set the desired topology, we use the ID3D11DeviceContext::IASetPrimitiveTopology method, which takes one parameter, namely the desired topology, which in this case is a D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST, which means that each three vectors are considered to construct a separate triangle.

Draw

Now that Direct3D knows what kind of primitives to render, and what vertex buffer to read from, the contents of that buffer can finally be rendered to the screen by the the ID3D11DeviceContext::Draw method. The Draw function has two parameters, the first parameter is the number of vertices that should be drawn and the second parameter defines the first vertex in the vertex buffer to be drawn.

Putting It All Together

Note that the above function to initialize the rendering pipeline must be called each time the game must resize its graphics.

There was also a slight oversight when we created the present function a few tutorials ago. Since we are using the flip mode, the render targets are reset after each call to present, thus we have to rebind them each time as well:

That's it! I admit it was a bit more difficult than just printing "Hello World", but it was still rather straight forward. I don't know about you, but I definitely need a beer now. Cheers! In the next tutorial, we will learn how to add colour to the scene.

Figures

Words can be like X-rays if you use them properly -- they’ll go through anything. You read and you’re pierced.

--- Aldous Huxley

Most games need some way to render high-quality text to the screen. DirectWrite provides just that and when used in combination with Direct2D, DirectWrite is hardware accelerated, thus fast and robust.

To use DirectWrite in our project, we first initialize Direct2D, tell Direct3D and Direct2D to play together nicely and we then set up DirectWrite to output text to our back buffer. Obviously, having access to Direct2D will be useful later as well, as we will need high-performance 2D and text rendering for menus, the user-interface and Heads-up Displays.

Setting up Direct2D and DirectWrite to work together with Direct3D basically takes seven little steps and most of these tasks are similar to the initialization of Direct3D:

Create the Direct2D and the DirectWrite factories.

Create the Direct2D device and its context.

Set up Direct2D to render to the same buffer as Direct3D.

Resize the Direct2D render targets when the game window is resized.

Set up brushes and text formats.

Set up text layouts.

Print!

Creating the Factories

The first thing to do is to create factories for Direct2D and DirectWrite.

The first parameter specifies whether the factory object will be shared or isolated. We will use DWRITE_FACTORY_TYPE_SHARED to indicate that we intend to use the DirectWrite factory as a shared factory, which allows the reuse of cached font data and generally leads to better performance.

REFIID iid

Directly from the MSDN: A GUID value that identifies the DirectWrite factory interface, such as __uuidof(IDWriteFactory).

IUnknwon **factory

After the function returns, this parameter contains the address to a pointer to the newly created factory.

And here is the actual (isolated) C++-code to create the DirectWrite factory:

The first parameter is the the threading model of the factory and the resources it creates. We will set this to D2D1_FACTORY_TYPE_MULTI_THREADED, enabling safe access to the Direct2D factory from multiple threads, which will be very useful later on.

REFIID riid

The second parameter is a reference to the IID of ID2D1Factory that is obtained by using __uuidof(ID2D1Factory).

The third parameter specifies the level of detail provided to the debugging layer. In release mode, we set this to D2D1_DEBUG_LEVEL_NONE, telling Direct2D to not produce any debugging output. In debug mode however, it is useful to set this to D2D1_DEBUG_LEVEL_INFORMATION, telling Direct2D to send error messages, warnings, and additional diagnostic information.

void **ppIFactory

Once the function returns, the fourth parameter contains the address to a pointer to the newly creasted factory.

The device and its context

After having created the factory, we can use it to create a Direct2D device and then use the device to create a Direct2D device context. To create these Direct2D objects, we must first obtain the DXGI device associated to the Direct3D device of the applicatipn. The following image shows how Direct2D and Direct3D work together.

The first parameter of the D2D1CreateDevice function is a pointer to the DXGI device the desired Direct2D should be associated with. The function also has an optional second parameter, which we will not set, telling DirectX that we wish for the Direct2D device to inherit its threading mode from the DXGI device. After the function returns, the actual Direct2D device will be stored in the third parameter.

flag tells Direct2D to distribute all of its rendering work across multiple threads. Once the function returns, the actual device context is stored in the second parameter.

Selecting a render target

Now that the Direct2D device and its context are created, it is time to tell Direct3D how to behave around with Direct2D around as well and to allow the later to render into the same back buffer surfaces.

As always, to create a rendering surface, or a bitmap, for Direct2D, a structure description must be filled out, this time a D2D1_BITMAP_PROPERTIES1 structure:

The first parameter defines the bitmap's pixel format and alpha mode. We set the pixel format to DXGI_FORMAT_B8G8R8A8_UNORM, as that is the format we used for our Direct3D back buffer and we set the alpha mode to D2D1_ALPHA_MODE_IGNORE. We will talk more about the alpha mode in later tutorials.

FLOAT dpiX and FLOAT dpiY

These options specify how a bitmap can be used. We will use D2D1_BITMAP_OPTIONS_TARGET, which specifies that the bitmap can be used as a device context target and 2D1_BITMAP_OPTIONS_CANNOT_DRAW, which specifies that the bitmap cannot be used as an input.

IDXGISurface *surface

The first parameter is the DXGI surface from which the bitmap can be created, it must have been created from the same Direct3D device that the Direct2D device context is associated with. As seen in the last tutorial, the GetBuffer method can be used to retrieve the back buffer.

It takes two parameters. The first parameter specifies the colour to create in red, green, blue, and alpha format. The second parameter contains the address of a pointer to the newly created brush. This parameter is passed uninitialized.

This function is rather easy to use, it takes a string of the text to render, the desired text format together with the dimension of the desired output buffer, and produces an object that represents the fully analyzed and formatted text.

Since at the time I wrote this, Visual Studio 15.3 wasn't out yet, thus there was no support for C++17 and if constexpr, I also finally suppressed the warning from the logger, it was really starting to annoy me:

I don't know about you, but I am starting to feel quite excited - none of this COM or DirectX stuff is alien or too difficult anymore. In the next tutorial we will finally draw triangles on the screen!

Photo

Figures

There are a dozen views about everything until you know the answer. Then there's never more than one.

---C.S. Lewis, That Hideous Strength

Now that the swap chain is set up, the GPU must be told where exactly it should draw. More precisely, a viewport, for clipping, for example, and a depth/stencil buffer, must be defined.

This tutorial will focus on how to initialize the viewport and the depth/stencil buffer, but any etails about how to use them, will be postponed to later tutorials. As an example though, the stencil buffer could be used to add shadows to 3D objects or to create motion blur effects.

For now, we simply want to render to the backbuffer.

The Render Target View

When rendering in Direct3D, DirectX must know where exactly to render to. The render target view is a COM object that maintains a location in video memory to render into. In most cases this will be the back buffer. Here is how the render target view can be created:

The first parameter specifies the index of which backBuffer to use. As we created the swap chain with the swap effect DXGI_SWAP_EFFECT_FLIP_DISCARD, GetBuffer can only access the zero-th buffer for read and write access - thus, for our situation, we have to set this to zero.

The second parameter is the interface type of the back buffer which will be a 2D-texture in most cases.

The first parameter specifies the resource the render target is created for.

The second parameter is a pointer to a D3D11_RENDER_TARGET_VIEW_DESC, which, among other things, describes the data type, or format, of the elements in the specified resource (first parameter). We declared our back buffer to have a typed format, thus we can set this parameter to NULL, which tells DirectX to create a view to the first mipmap level of the specified resource. Do not worry, mipmaps will be covered in a later tutorial, for now we can safely set this to NULL.

The third parameter returns a pointer to the created render target view.

The Depth/Stencil Buffer

The depth/stencil buffer basically is a 2D-texture that stores the depth information of the pixels to render. To create this texture, once again a structure description, namely an ID3D11_TEXTURE2D_DESC, must be filled out:

UINT ArraySize

For our depth/stencil buffer we will set this to DXGI_FORMAT_D24_UNORM_S8_UINT. Again, the details will be covered in a later tutorial. For now it is enough to know that a structure format which uses 24-bits for the depth buffer and 8-bits for the stencil buffer is requested.

We have seen this structure before, it is used to specify how multi-sampling or anti-aliasing is done. It should be clear that those settings for the depth/stencil buffer must match the settings for the render target.

This member identifies how the texture is to be read from and written to. For our depth/stencil buffer we use D3D11_USAGE_DEFAULT, which tells DirectX that the GPU, and only the GPU, will be reading from and writing to the resource.

Those flags are various options for how to bind to the different pipeline stages; the pipeline will be covered in later tutorials. For our depth/stencil buffer we have to use the D3D11_BIND_DEPTH_STENCIL flag.

The second parameter is a pointer to the initial data that the resource should be filled with. Since we are using the texture as a depth/stencil-buffer, it needs not to be filled with any initial data, and thus this parameter can safely be set to NULL.

The first parameter is a pointer to the depth/stencil-buffer resource we just created.

The second parameter is a pointer to a D3D11_DEPTH_STENCIL_VIEW_DESC structure. Setting this parameter to NULL creates a view that accesses mipmap level 0 of the entire resource using the format the resource was created with. This is what we wanted.

The last parameter returns the address of a pointer to the created depth/stencil-view.

The first parameter of the function is the number of render targets that we want to set. For now, we only want to set one render target, thus we set this value to one. Rendering simultaneously to several different render targets is a rather advanced technique that we won't cover until way later.

The second parameter is a pointer to the first element in a list of render target view pointers. Again, we only have one render target view, thus we can simply input the address of our render target view interface here.

The third parameter is a pointer to the depth/stencil-view.

Setting the Viewport

To tell DirectX what area of the backbuffer to render into, yet another structure must be filled out, the D3D11_VIEWPORT description:

The first four floats define the viewport rectangle (relative to the client window rectangle). For now we will just set this to the entire client area, since we want to be able to draw on the entire window.

The MinDepth and MaxDepth members specify the minimal and maximal depth buffer values, which will be set to zero and one for now.

The first parameter is the number of viewports to use and the second parameter is a pointer to an array of viewports. For example we could use multiple viewports to implement a split-screen view like in the good old Nintendo times. In a later tutorial, we will also see how to use multiple viewports to create an user-interface. But for now, we will just set the viewport to the only one we have created so far:

devCon->RSSetViewports(1, &vp);

Clearing the Back and Depth Buffers

What is left to do now is to clear the back and depth/stencil buffers after each frame so that new scenes can be rendered without any leftover artifacts.

To clear the back buffer, it is enough to simply fill the entire back buffer with a single colour using the ID3D11DeviceContext::ClearRenderTargetView method, which sets each pixel in a render target view to a specified colour:

The GPU contains in its memory a pointer to a buffer of pixels encoding the image currently being displayed on the screen. When asked to render a scene, the GPU updates this buffer and sends the new information to the monitor to display. The monitor then redraws the screen from top to bottom, replacing the old image with the new one.

However, there is a slight problem with this in that the monitor does not refresh as fast as needed for real-time rendering. To understand this, we must understand how computer monitors work. As an example, we will have a brief look at Liquid Crystal Displays, or LCDs.

Liquid Crystal Displays

Liquid crystals are organic molecules that flow like a liquid while retaining their lattice structures like solid crystals. They were first discovered in 1880 by the Austrian botanist and chemist Friedrich Reinitzer and first applied to small displays in the 1960s. Using electric fields, the alignment of the molecules can be controlled, and thus the optical properties can be changed.

An LCD consists of liquid crystal sealed in between two parallel glass plates with electrodes attached to them. In the widely used active matrix displays, there is a switch at each pixel position on one of the electrodes; by turning those switches on and off, it is possible to create an arbitrary voltage pattern across the entire screen, allowing arbitrary bit patterns. Those switches are often called thin film transistors, and the monitors using this technology are widely known as TFT displays.

A light source, originally cold cathodes, but in modern screens usually light-emitting diodes (LEDs save power and allow for even thinner displays), illuminate the screen from the back, and the alignment of the crystals determines the visual output at the front.

Obviously this setup only provides monochromatic images, the idea behind coloured images is the same, but the technological details are a lot more complicated.

Video Random Access Memory and Color Palettes

The image on the display is taken from a pixel buffer, a special memory inside the GPU, called video random access memory, or VRAM for short. For example, in a $1920 \times 1080$ resolution, the VRAM would contain an array of $1920 \cdot 1080$ values, one value for each pixel to represent. Those values might simply be a $3$-byte RGB value, representing the red, green and blue component of each pixel, thus requiring $6,2208$MB of VRAM. In the olden days, this was a lot of space, thus a little trick was used: the desired colour of each pixel was stored in a 8-bit number, used as an index to retrieve the actual colour in a hardware table, the color palette. This reduced the the VRAM requirement by $\frac{2}{3}$, but only allowed $256$ colours to be active on the screen at once. Famous Blizzard games, such as Diablo II and Starcraft used this technique successfully, but nowadays VRAM isn't as limited anymore.

The above mentioned example used the $24$-bit, or true colour mode, which uses exactly one byte per colour, giving access to $256^3 = 16777216$ different colours. Unfortunately, this makes for ugly addressing and many video cards actually do not support $24$-bit colours, they use $32$-bit colours instead.

In $32$-bit color mode, the data for each pixel is stored in a format using eight bits for alpha, or the transparency information, and eight bits for each colour. So basically, this format is a variant of the true colour format in which the additional eight bits are allocated to hold transparency or other information. In the previous tutorial, Direct3D was created to support the BGRA-colour mode (blue, green, red, alpha).

Memory Bandwith and Accelerated Graphics Ports

Another problem was the limited bandwith from the CPU to the VRAM, for example, copying $6$MB for at least $60$ frames per second, a total data rate of $360$MB per second is necessary. This is way above what any older computer could handle. Thankfully, with the invent of the Pentium II, and the Accelerated Graphics Port, or AGP, this problem was solved as well. Already the very first such port could transfer data at a rate of $252$MB per second, and nowadays the numbers are even more impressive, with AGP3 allowing transfers at $2133$MB per second.

Okay, so, we have enough VRAM and the bandwith between the CPU and the VRAM is very fast, what then is the problem?

Refresh Rates

Most displays are refreshed between 60 and 100 times per second. In the International System of Units (SI), the refresh rate, or the frequency of those refreshes, is denoted by Hertz, or Hz, named after the German physicist Heinrich Hertz.

To refresh the screen, in oldschool CRT monitors, an electron gun, firing streams of electrons, is moved horizontally accross the screen. The gun starts drawing at the top-left corner of the screen and shifts to the right horizontally to draw the first so-called scanline. It then repositions itself at the left edge of the next scanline to start drawing again. This process is repeated until all scanlines have been drawn.

Once the drawing is complete, the electron gun is positioned at the bottom-right edge of the screen. The time it takes for the electrical gun to move back to its original position at the top-left corner of the screen is called the vertical blank interval, or VBLANK, for short.

While LCDs work somewhat differently, the basic idea of the VBLANK is still very helpful.

Now suppose that the electron gun is halfway done with its job of redrawing the screen, when our application requests for a new output to be drawn immediately: The new images would only be drawn in the bottom half of the screen, while the top half would still show the old images. This effect of the screen showing parts of two different frames at the same time is called screen tearing.

One solution might be to only update the game data during the VBLANK, but obviously modern games take longer to compute updates to the game world than it takes for an electron gun to diagonally race accross the screen once. A possible solution is the so-called double buffering technique which DirectX implements using a swap chain.

The Swap Chain

To avoid screen tearing, most computer animation is achieved by drawing each frame of animation in an offscreen buffer area, called the backbuffer, and then quickly copying the offscreen image to the visible surface. This is called blitting. As long as the copying is done quickly enough, no screen tearing is visible. This process of drawing an image in the backbuffer and then copying it to the actual display surface is the above mentioned technique of double buffering.

However, blitting could still cause screen tearing, because the image transfer could theoretically still take longer than the VBLANK. To help with that problem, DirectX, or more precisely, DXGI, also implements a feature called swapping, or page flipping, that does just what the name says, it swaps the backbuffer and the display surface: DirectX uses a pointer for each buffer and simply switches their values.

Obviously, to fully prevent screen tearing, the swap must happen during the VBLANK; and the faster the buffer flipping, the more time we have to update our game world.

In this tutorial we will implement two back buffers (as VRAM isn't as restricted nowadays):

This setup is the so called swap chain, as it is a chain of buffers, swapping positions each time a new frame is rendered.

Creating the Swap Chain

In DirectX, the swap chain is represented by a new COM interface, the IDXGISwapChain.

To actually create the swap chain, three steps must be completed:

The swap chain must be customized by filling out a swap chain description structure.

A pointer to a DXGI Factory must be obtained; an object true to its name, as it is capable of creating other DXGI objects.

UINT Width and UINT Height

Those unsigned integer values indicate the width and height of the swap-chain buffers in pixels. If set to zero, the swap chain automatically sizes itself to the current resolution of the active window.

This rational number describes the refresh rate in Hertz. For example if we want to run with constant 60 fps, we set this to $\frac{60}{1}$. There is a problem with this, however, if the refresh rate does not fit with the actual display mode: DirectX then first has to create a new buffer, with the correct resolution, or a close match, which wastes resources and time. Setting this value to \frac{0}{1} tells DirectX to not check whether the refresh rates fits the refresh rate of the current display.

This member describes the display format to use. See the above explanation of the color palette for more details. There are plenty of options to select from. In this tutorial we will use DXGI_FORMAT_B8G8R8A8_UNORM, which means that we reserve 8 bits for blue, 8 bits for green, 8 bits for red, and 8 bits for transparency, in that order, and each colour will be stored in an unsigned normalized integer, which is optimized for GPU reading.

This member describes the scanline drawing mode, i.e. the method the raster uses to create images on a surface. We will use DXGI_MODE_SCANLINE_ORDER_UNSPECIFIED which just means that the scan line order will be unspecified.

Those flags indicate how images are to be stretched to fit the backbuffer resolution. We will use DXGI_MODE_SCALING_UNSPECIFIED, which means that our rendered images will just appear in the top-left corner of the window. Also, since in later tutorials, we will cover going into fullscreen mode and want to make sure that we do not initiate a mode change when transitioning toto full screen, we are advised to use DXGI_MODE_SCALING_UNSPECIFIED anyway.

Count is the number of multisamples per pixel and Quality is the image quality level, the exact specifications depend on the GPU. We will talk more about this in a later tutorial, for now, we will disable multisampling by using the default sampler mode, with no anti-aliasing, with a count of $1$ and a quality level of $0$.

This member describes the surface usage and CPU access options for the back buffer. The back buffer can be used for shader input or render-target output. We will obviously use DXGI_USAGE_RENDER_TARGET_OUTPUT for now, which tells DirectX to use the back buffer as an output render target.

UINT BufferCount

This member sets the number of buffers in the swap chain, including the front buffer. For now we will create a front buffer and two back buffers, thus we set this to three.

HWND OutputWindow

An HWND handle to the output window, we will set this to the handle of the main window.

BOOL Windowed

This member is a boolean value that specifies whether the output is in windowed mode. Microsoft recommends that swap chains be created as windowed swap chain and switched to fullscreen afterwards, if so desired. For now, we will stay in windowed mode; fullscreen applications will be considered in a later tutorial.

Those flags describe options for handling the contents of the presentation buffer after presenting a surface, i.e. it tells DXGI what to do with the buffers once they have been shown and are no longer of use. We will use DXGI_SWAP_EFFECT_FLIP_DISCARD, to specify the flip presentation model and to specify that DXGI discard the contents of the back buffer after it was presented to the scene. Please note that in windowed mode, DXGI will still blit instead of flip.

These flags describe further options for the behaviour of the swap chain. For now we will use DXGI_SWAP_CHAIN_FLAG_ALLOW_MODE_SWITCH, which allows an application to switch between display modes. We will talk more about that in a later tutorial.

Wow, that was a lot of information to cover, but thankfully we do not need to be concerned about all the details just yet. Here is the actual code to set up the swap chain description:

Since in the last tutorial we created the Direct3D device without a swap chain, we need to retrieve the factory that was used to create the device in order to actually create a swap chain now. From the Direct3D device an IDXGIDevice can be requested using the As function. To finally retrieve the factory that created the Direct3D device, we use GetAdapter followed by a call to GetParent.

Swapping!

With the swap chain created, it can be used to draw, or present, the actual game scene to the screen. This is done using the Present function:

HRESULT Present(
UINT SyncInterval,
UINT Flags
);

The SyncInterval integer specifies how to synchronize presentation of a frame with the VBLANK. In flip mode, the possible values are $n=0$, to tell DirectX to cancel the remaining time on the previously presented frame and discard this frame if a newer frame is queued, or $n=1$ to $n=4$, to tell DirectX to synchronize the presentation after the $n$-th vertical blank. We will use $0$ for now, accepting that some screen tearing might occur.

The Flags specify various options to present the scene, we will DXGI_PRESENT_DO_NOT_WAIT, which tells DXGI to not sleep or wait for v-sync. Please note that present returns with the DXGI_ERROR_WAS_STILL_DRAWING if the calling thread is blocked.

Resizing

There is one more thing to worry about. When the size of the window changes, the back buffers must be resized as well. This is done using the ResizeBuffers function:

The Symmetric Group

A permutation of a set $M$ ($M$ for Menge, the German word for set) is a bijectivefunction $\sigma: M \to M$.

As an example, consider the set $M := \{ \Delta, \circ, \star \}$; now one such permutation $\sigma$ might be defined to do the following: \[ \Delta \mapsto \circ \, , \, \circ \mapsto \Delta \, , \, \star \mapsto \star . \] The star is fixed, but the triangle and the circle are permutated.

For a more concrete example, the shape or the nature of the objects in the set does not matter, all that must be known is the number of objects in $M$, called the cardinality $| M |$ of $M$. Let $M := \{ 1, 2, 3 \}$ be a set with $3$ elements, then the above permutation can then be described by
\[ 1 \mapsto 2 \, , \, 2 \mapsto 1 \, , \, 3 \mapsto 3. \] The third element is left fixed, while the first two are permutated.

Generally, as such permutations are bijective, they are invertible, with their inverse being a, probably different, permutation of $M$ as well. The composition of permutations is again a, probably different, permutation of $M$. There also clearly exists a permutation that does nothing, the identity, namely $m \mapsto m$, for each element $m \in M$. Therefore, the set of all the permutations of a set $M$ is a group, the so called symmetry group $\operatorname{Sym}M := \{ \sigma : M \to M \, \mid \, \sigma \text{ bijective } \}$ of $M$.

For a finite set $M := \{1,2, \dots n\}$, the symmetric group $S_n$ of $M$, or the symmetric group of order $n$, is defined as the permutation group of $M$: $S_n := \operatorname{Sym}\{1, 2, \dots, n\}$. Note that the order of $S_n$ is $n!$.

One can think of those permutations as the symmetries of an object. Imagine a square rotated by $\frac{\pi}{2}$, the actual vertices have moved, but the square still looks the same. Can you figure out all of the eight symmetries of the square?

If you ever wanted to see the mathematical beauty of symmetry, think about visiting the Qalat Al-Hamra, or the Alhambra, in Grenada, Spain.

Cayley's Theorem

Arthur Cayley was a British mathematician, and probably the first man to clearly define the concept of a group. One of his famous theorems states that every group is isomorphic to a group of permutations.

Two objects are said to be isomorphic, if there exists an isomorphism, which is a bijective homomorphism (from the ancient Greek words ὁμός (homós), which means equal or similar, and μορφή (morphé), which means form), between the two objects. Homomorphisms are structure preservering maps that identify objects which, although looking differently, are essentially the same.

Furthermore, $G$ being a group ensures that $\sigma$ is an injective homomorphism, which means that $G$ is indeed isomorphic to a subgroup of $\operatorname{Sym}G$. Subgroups of $\operatorname{Sym}G$ are often called permutation groups and are denoted by $\operatorname{Per}G$.

Group Operations

With Cayley's theorem, it is now clear how to define a group operation on a set: An operation of a group $(G, \cdot)$ on a set $M$ is a homomorphism $\sigma$ of $G$ into the symmetry group of $M$ defined as above. An operation thus provides a mapping $G \times M \to M$, $(g,m) \mapsto \sigma_g(m) = g.m = g \cdot m$. We also say that $G$ acts on $M$.

Conversely, a map $G \times M \to M \, , \, (g,m) \mapsto g.m$ satisfying these two properties defines a permutation for each $g \in G$. We thus obtain a homomorphism from $G$ into $\operatorname{Sym}G$. So an operation of $G$ on $M$ could also be defined as a mapping $G \times M \to M$ with the two properties from above.

The Orbit

Let $m \in M$ be arbitrary chosen but fixed. The subset of $M$ consisting of the images of $m$ under operation by elements of $G$ is denoted by $G.m := \{ g.m \, \mid \, g \in G \}$ and called the orbit of $m$ under $G$.

Clearly, all of the orbits define a partition of $M = \bigcup_{m \in M}G.m$. If an element $m \in M$ belongs to the orbit of both $m_1 \in M$ and $m_2 \in M$, then there exist $g_1, g_2 \in G$ such, that $m = g_1.m_1 = g_2.m_2$. Since $G$ is a group, this implies that $m_1 = g_1^{-1} \cdot g_2.m_2$ and $m_2 = g_2^{-1} \cdot g_1.m_1$, thus $m_1$ belongs to the orbit of $m_2$ and vice-versa. In conclusion: $G.m_1 = G.m_2$, thus, by chosing representatives $v$ for each orbit, we can write \[M = \bigcup_{v \in V} G.v,\] where $V$ denotes the set of the chosen representatives.

A group operation is called transitive, or G is said to act transitively on $M$, if there is only one orbit: $M = G.m$, for any $m \in M$, that is, for all $m_1, m_2 \in M$ there exists $g \in G$ such, that $m_2 = g.m_1$.

The Stabilizer

Let $m \in M$ be arbitrarily chosen but fixed. The set of elements $g \in G$ which leave $m$ fixed, that is $g.m = m$, is a subgroup of $G$, the so-called isotropy group of $m$ in $G$, or the stabilizer of $m$ under the action of $G$, denoted by $G_m := \{ g \in G \, \mid \, g.m = m \}$. An element $m \in M$ is called a fixed point of $G$, if $G_m = G$, in other words, if it is fixed by each operation.

Obviously the kernel $\operatorname{ker}\sigma = \{g \in G \, \mid \, \forall m \in M \, : \,g.m = m \}$ of a group operation is the intersection of all the isotropy groups: $\operatorname{ker}\sigma = \bigcap_{m \in M}G_m$. This is the set of all the elements of $G$ which leave all the elements of $M$ fixed.

A group operation is said to be faithful, if its kernel is trivial, i.e. if $\operatorname{ker}\sigma = \{ e_G \}$.

The Orbit-Stabilizer Theorem

The size of the orbit of an element $m \in M$ is the index, the relative size, of the stabilizer of $m$ in $G$:
\[ | G.m | = [ G : G_m ]. \]
For a finite set $G$, this corresponds to:
\[ | G.m | = \frac{| G |}{|G_m|}.\]
Note that in particular, the size of any orbit divides the order, or the cardinality, of the group.

For a proof, we must define a bijective map $\varphi$ between $G.m$ and $G / G_m := \{ g.G_m \, \mid \, g \in G \}$, the set of left cosets of $G_m$ in $G$. We can do so as follows: \[\varphi: G.m \to G/G_m \, , \, n = g.m \mapsto g.G_m. \]

The first thing to check is whether $\varphi$ is well-defined, as for a given $n$, as above, the choice of $g$ in $n = g.m$ is not unique, as there is no bijection between $G$ and $G.m$ in general. Suppose that $n = g_1.m = g_2.m$, then clearly \[g_2^{-1}g_1.m = m \Leftrightarrow g_1.G_m = g_2.G_m,\] thus the image of $n$ under $f$ does not depend on the choice of $g$, since the images are still in the same coset.

By the definition of $\varphi$ it is clear that $\varphi$ is surjective. To see that it is also injective, assume that $\varphi(m_1) = \varphi(m_2)$, for $m_1 = g_1.m \in G.m$ and $m_2 = g_2.m \in G.m$. Then, as $g_1.G_m = g_2.G_m$, i.e. $g_2^{-1} \cdot g_1 \in G_m$, it follows that $g_2^{-1} \cdot g_1.m = m$, and in conclusion $g_1 \cdot m = g_2 \cdot m$, as desired.

The Orbit-Counting Theorem

In the case of a finite group operating on a finite set, a theorem, probably due to Ferdinand Georg Frobenius, allows us to count the number of orbits $n_o$:

References

Literature

Art

How do you say "We come in peace" when the very words are an act of war?

--- Peter Watts, Blindsight

In the last, rather theoretical, tutorial we learned about the Component Object Model architecture and that DirectX is a collection of such COM objects. In this tutorial we will jump right into the action and initialize Direct3D! Are you ready for

The pointer to the desires COM interface is acquired by a call to the CreateObject function. Each COM object type has its own way of being created, and we will learn about a few of those as we move forward with the tutorials.

The Device and its Context

At the very core of Direct3D are two COM objects: the device and the device context.

The device object is a virtual representation of the video adapter and it can be used to access the memory of the GPU and to create other Direct3D related COM objects.

The device context is a structure that defines a set of graphic objects and their associated attributes, as well as the graphic modes that affect output. The graphic objects include a pen for line drawing, a brush for painting and filling, a bitmap for copying or scrolling parts of the screen, a palette for defining the set of available colours, a region for clipping and other operations, and a path for painting and drawing operations. Thus, the device context can be seen as the control panel for the GPU. Through it, the tranformation of a three-dimensional model to a final two-dimensional image, and the process of rendering that image to the screen, can be controlled.

The interfaces for these objects are called ID3D11Device and ID3D11DeviceContext. To create and initialize them, the D3D11CreateDevice function must be called:

This is a pointer to an interface that describes the GPU that Direct3D should use. For now, we shall simply let Direct3D take care of the details, as in most cases, there is only one GPU anyway. To do that, we input a nullptr here.

The DriverType represents the driver type to create. There are six possible values for this parameter, but we are only going to be concerned with one of them: D3D_DRIVER_TYPE_HARDWARE which tells Direct3D to use the hardware accelerated graphics chip to process graphics.

HMODULE Software

A handle to a DLL that implements a software rasterizer. If the DriverType is D3D_DRIVER_TYPE_SOFTWARE, Software must not be NULL. However, as we want to directly work with the hardware, we will just use NULL here.

This parameter defines the runtime layers to enable. We will use the D3D11_CREATE_DEVICE_BGRA_SUPPORT flag to enable interoperability between Direct2D and Direct3D. While debugging, it might be a good idea to also use the D3D11_CREATE_DEVICE_DEBUG flag, which creates a device that supports the debug layer.

Well, this wasn't so difficult, was it? We have successfully added a Direct3D device and its context to our application. In the next few tutorials we will learn how to use these two to actually render objects to the screen; we will start with the swap chain, in the next tutorial.

Art

Over the centuries, mankind has tried many ways of combating the forces of evil... prayer, fasting, good works and so on. Up until Doom, no one seemed to have thought about the double-barrel shotgun. Eat leaden death, demon ...

--- Terry Pratchett

An overview of DirectX

Just assume that you want to write a piece of software that needs direct access to the GPU and CPU, has networking capabilities, plays audio and uses the mouse, keyboard and a joystick for user input. That sounds like fun, but can you imagine writing drivers to control every single GPU out there and change your program depending on the capacities of the GPU? Can you imagine polling all the different available keyboards or joysticks, knowing that each company has different standards? That doesn't sound like so much fun anymore, does it?

Well, thankfully, Microsoft and all the hardware manufacturers, made a huge effort and created a very high-performance standard, DirectX, that allows programmers to access all the above mentioned features with relative ease. Granted, when using DirectX, the programmers lose a tiny bit of control, but hell, it is definitely worth it. When I first started to program as a kid, things were a lot more difficult, DirectX and similar APIs are definitely a boon.

But what exactly is DirectX? As was already mentioned, DirectX allows relatively direct control over a computer's hardware, with some software layers between the programmer and the actual hardware. Basically, with COM, Microsoft invented a set of conventions that all hardware vendors must use when implementing their various drivers to talk to the hardware. The technology Microsoft and the hardware manufacturers use to achieve this is called the Component Object Model, or COM, for short.

For programmers, that means that as long as the hardware manufacturers and vendors stick to those conventions, any DirectX based programs will (theoretically) work on all available hardware. To get acquainted with DirectX, we will take a look at its build-up.

DirectX Architecture

As an example, we will look at Direct3D. Here is an image, taken from the MSDN, explaining the architecture of Direct3D:

Direct3D applications can exist alongside GDI applications, and both have access to the computer's graphics hardware through the device driver for the graphics card. Unlike GDI, Direct3D can take advantage of hardware features by creating a Hardware Abstraction Layer, or HAL, device.

In general, the HAL is part of the operating system, granting the rest of the system access to abstract hardware devices devoid of the above mentioned idiosyncrasies with which real hardware is often endowed. In the context of Direct3D, the HAL device provides access to the pipeline functions of the GPU (covered in a later tutorial), based upon the feature set supported by the graphics card. If a feature is not provided by the hardware, the program code will be emulated by software in the Hardware Emulation Layer, or HEL.

Basically, the HAL talks directly to the hardware (which means that the HAL very often is the actual device driver from the manufacturer) and thus is quite fast. If, however, if a feature is not supported by the hardware, the application doesn't halt, thankfully, but is emulated in the HEL, using a software algorithm. Obviously, this will be slower, but that is still better than entire programs breaking apart.

DirectX Components

Let us take a look at the different components of DirectX:

Graphics

The primary goal of DXGI is to manage low-level tasks that can be independent of the DirectX graphics runtime. DXGI provides a common framework for future graphics components; the first component that was to take advantage of DXGI is Microsoft Direct3D 10.

DXGI provides objects to handle tasks such as enumerating graphics adapters and monitors, enumerating display modes, choosing buffer formats, sharing resources between processes (such as between applications and the Desktop Window Manager), and presenting rendered frames to a window or monitor for display.

The Direct2D component, oh wonder, is used to draw 2D graphics. Direct2D allows interoperability with GDI, GDI+, and Direct3D and permits rendering to and from a Direct3D surface, as well as to and from a GDI/GDI+ device context (HDC) with full serialization of surfaces and device contexts, which enables it to work with other native Windows technologies such as DirectWrite.

DirectWrite is a text layout and glyph rendering API. When running on top of Direct2D, DirectWrite is hardware-accelerated. We will use Direct2D and DirectWrite in a later tutorial to print text (such as FPS information) to our game window.

XAudio2 is based on the Xbox-360-Sound-API (shudders) and has deprecated DirectSound. A few of the highlights of XAudio2 are Digital Signal Processing-Effects (turning animal screams into scary monster sounds - think of pixel shaders for audio), Submixing (combining several sounds into a single audio stream) and Surround Sound.

XACT3

FMOD, developed by Firelight Technologies, is not a component of DirectX, but it is mentioned here, since in these tutorials, we will be using FMOD to add music and sounds to our games (just like Blizzard does, for example).

Input

This component handles all the different input devices, such as the mouse, keyboard and joysticks. DirectInput also allows us to create Force-Feedback-Effects. DirectInput does not send event messages and runs directly on the hardware. It is now deprecated, however, in favour of XInput for Xbox 360 (shudders, again). In these tutorials we will still use DirectInput, it might be deprecated, but it is functionally complete.

XInput is an API for "next generation" controllers and was introduced in 2005 alongside the launch of the Xbox 360. Microsoft describes it as being easier to program for and requiring less setup than DirectInput. XInput is compatible with DirectX version 9 and later.

Networking

DirectPlay is the networking component of DirectX. It allows us to make connections through the Internet or via local LAN, for example. It basically sends and receives all the packets for us, so that we do not really have to worry about sockets and other stuff. DirectPlay also supports sessions and lobbies. DirectPlay unfortunately produced a lot of overhead, thus most games have their own network-code. DirectPlay is deprecated in favour of Games for Windows - Live technology. (shudders, yet again)

Advanced Components

Utilities

DirectX Diagnostics gathers information about the system and the DirectX components installed on it, as well as providing a number of tests to ensure that components are working properly.

DirectX Setup

This allows to test, during installation, whether the required version of DirectX is available on the system, and if not, the installation of the required DirectX version can be requested.

This concludes the overview of the DirectX components. Reading all of this you might be scared that once you mastered one component, it might be deprecated, but don't worry. Even though graphics, and game technology in general, is moving very fast, DirectX is always downwards compatible, i. e. if we write DirectX 11 code, we can be sure that it will still work in DirectX99, when the cyborgs rise up. How does DirectX manage that rather incredible feat? Well, DirectX is based, as we briefly mentioned in the introduction of this tutorial, on COM technology ...

The Sith Dreadmasters (or COM)

As computer programs are getting larger and larger, abstraction and hierarchy are getting more and more important to avoid chaos and disaster. The Component Object Model, COM, for short, is a software architecture that allows applications to be build from binary software components - think of Lego blocks for example, simply put the blocks together and the result always works (although the result might not be what we had expected).

Lego blocks can be stuck together to create more advanced shapes. The different Lego blocks actually do not care about their brethren's particularities, they are all compatible with each other.

Obviously, to implement such a technology, one needs a very generic interface that can take the form of any possible type of function that one can imagine. That is what COM does. COM is the underlying architecture that forms the foundation for higher-level services such as DirectX.

The COM defines several fundamental concepts, there is a binary standard for function calling between COM components and there is a way for components to dynamically discover the interfaces implemented by other components. What does that mean for programmers? New features can be added to COM objects without breaking any software that uses that COM object, different COM objects can be added together and COM objects can be changed without recompiling the program (which means that programs can be updated to use newer COM versions without having to recompile everything). Amazing, just like Lego (and it somehow reminds me of vector spaces ...).

This is great, right, but what exactly is a COM object? A COM object basically is a C++ class that implements a certain number of interfaces. An interface is, loosely speaking, a set of functions, and is used to communicate with the actual COM object.

There can be multiple COM objects each with multiple interfaces, the important thing is that all those interfaces must be derived from the IUNKNOWN interface:

Note that all the methods are pure virtual functions, thus all derived interfaces must at least implement each of the above methods. (As a reminder: __stdcall means that the parameters are pushed on the global stack from right to left).

This function is the key to the COM world as it is used to request a pointer to the desired interface. To request for a specific interface, the interface ID, a 128 bits long unique integer, must be known. QueryInterface calls the AddRef function on the returned pointer.

As COM is language-independent, it can't, for example, use malloc or new to create new objects, but somehow it has to keep track of how many objects there are at any given moment in time. Thus, COM objects use reference counting to keep track of their lives. When the reference count is equal to zero, the objects are destroyed internally. As you can now guess, the AddRef functions increments the counter by one, and Release decrements the same counter by one.

For more information on how to create your own COM objects, check Chapter 5 of LaMothe's book or Microsofts COM page.

(Direct)X-Com

The good news is that DirectX tries to hide most of the tedious COM business behind little wrapper functions. Thank you, Microsoft. We have already seen that DirectX is made up of many different COM objects, so now all that is left to learn is how to actually get access to those objects.

Now assume that we have an interface pointer to a COM object. What actually happens in the background is that we have a Virtual Function Table pointer. A virtual function table is a mechanism used in a programming language to support dynamic dispatching, that is, whenever a class defines a virtual function, the compiler adds a hidden member variable to the class which points to an array of pointers to (virtual) functions. These pointers are used at runtime to invoke the appropriate function implementations, because at compile time it may not yet be known if the base function or a derived one, implemented by a class that inherits from the base class, is to be called.

So, to recapitulate, COM is a way of writing component software that allows the creation of reusable modules that are dynamically linked together at runtime. Each of these objects are collections of interfaces which, in turn, are a collection of functions that are referenced through a virtual function table pointer. Thus, all we have to do to work with DirectX COM objects is to "create" them and to retrieve their interface pointers. We will learn how to do just that in the next tutorial.

The service locator class, defined in serviceLocator.h, provides services to the entire application, without coupling anything together. So far, in these tutorials, the service locator provides a file logging service. Later, it will surely see more use, for example, when an audio service is implemented.

Here is an example of how to use the service locator to retrieve the file logger and to print information to the log file:

util::ServiceLocator::getFileLogger()->print<util::SeverityType::info>("This is an information message.");

The Core Classes

Those are the core components of our application. They are invaluable to any Windows program.

The Timer Class

Keeping track of time is highly important for any game, especially if a mathematical simulation of a physical world is intended. The timer class does just that, by encapsulating a high-performance counter made available by Windows. It is automatically updated and used in the main game loop, thus we do not really have to worry about timing when proceeding with the next tutorials.

class Timer
{
private:
// times measured in counts
long long int startTime; // time at the start of the application
long long int totalIdleTime; // total time the game was idle
long long int pausedTime; // time at the moment the game was paused last
long long int currentTime; // stores the current time; i.e. time at the current frame
long long int previousTime; // stores the time at the last inquiry before current; i.e. time at the previous frame
// times measured in seconds
double secondsPerCount; // reciprocal of the frequency, computed once at the initialization of the class
double deltaTime; // time between two frames, updated during the game loop
// state of the timer
bool isStopped; // true iff the timer is stopped
public:
// constructor
Timer();
~Timer();
// getters: return time measured in seconds
double getTotalTime() const; // returns the total time the game has been running (minus paused time)
double getDeltaTime() const; // returns the time between two frames
// methods
util::Expected<void> start(); // starts the timer, called each time the game is unpaused
util::Expected<void> reset(); // sets the counter to zero, called once before message loop
util::Expected<void> tick(); // called every frame, lets the time tick
util::Expected<void> stop(); // called when the game is paused
};

The following times are measured in counts, and are all updated automatically:

long long int startTime;

This is the time at the start of the application, or more precisely, the time at the moment the reset (see below) function was called last. The DirectXApp class calls the reset function at the start of the game loop (see below), thus the start time will automatically be set to be the start of the game loop.

long long int totalIdleTime

This variable keeps track of the total time the game was idle, for example when the game is paused, or the game window is minimized.

long long int pausedTime

This variable holds the last time the game was paused.

long long int currentTime

This variable is used to store the time at the current frame.

long long int previousTime

This variable stores the time at the last inquiry before the current frame, that is, the time at the previous frame.

The following times are measured in seconds:

double secondsPerCount

This double holds the amount of seconds per count, computed by the reciprocal of the frequency. It is computed at the initialization of the Timer class.

double deltaTime

This double member holds the time elapsed between the previous and the current frame, It is automatically updated and used during the game loop. It is essential for correct behaviour of the physics engine.

bool isStopped

This boolean member is true if and only if the timer is paused. It is used to keep track of the total time the game was idle.

Timer()

The constructor of the Timer class queries for the frequency of the high-performance counter, i.e. the amount of counts per second and then computes the seconds per count as the reciprocal of the frequency. If initialization fails, it throws a std::runtime_error exception.

~Timer()

As always, the constructor does what detructors do.

double getTotalTime() const

This public constant function simply returns the total running time, in seconds, of the application, that is, the total time minus the total idle time.

double getDeltaTime() const

This public constant function returns the time, in seconds, elapsed between two frames, it is used in the game loop and absolutely essential for a robust physics engine.

util::Expected reset()

This function sets the startTime of the Timer to the moment it was called. It is used just before the game loop to set the start time to be the start of the game.

util::Expected start()

This function starts the timer. It is automatically invoked each time the game becomes active again.

util::Expected tick()

This function lets the timer tick by updating the currentTime and previousTime member variables. It is automatically called at each frame during the game loop.

util::Expected stop()

This function stops the timer, it is automatically invoked each time the game is paused.

The Window Class

The Window class handles all Windows related stuff, such as creating the actual application window and handling all event messages that the operating system might send to our application. With the Window class in place, we no longer really have to care about Windows at all, we can simply program our game and forget about the operating system.

Most members of the Window class are private, but the DirectXApp, as a friend, still has direct access to all of them.

HWND mainWindow

This is the handle to the main window of our application.

DirectXApp* dxApp

This is a pointer to the main class of the application (see below).

int clientWidth and int clientHeight

Those two members store the client dimension of the window.

bool isMinimized

This boolean member is true if and only if the window is minimized

bool isMaximized

This boolean member is true if and only if the window is maximized.

bool isResizing

This boolean member is true if and only if the window is being dragged around by the mouse.

util::Expected init()

This private member function defines and initializes the main window. It is automatically called when the DirectXApp class initializes.

void readDesiredResolution()

This private function simply reads the desired screen resolution from a Lua configuration file. This function is automatically called during window initialization.

Window(DirectXApp* dxApp)

The constructor of the Window class stores the pointer to the DirectXApp class in the appropriate member variable and then calls the initialization function. If an error occurs, a std::runtime_exception is thrown. If the function is successful, the main window handle is available in the mainWindow variable. The creation of the window is started by the DirectXApp class.

~Window()

Well, the destructor does whatever destructors do, it destroys.

inline HWND getMainWindowHandle() const

This little public constant function simply returns the main window handle.

Last, but not least, behold the most important function of the Window class, the message procedure. The message procedure handles all the events that Windows throws at our application, for example when the window changes in size, or if the user tries to exit the application. To change the way events are handled, this is the place to go to.

The following events are handled by the Window class at the moment:

WM_ACTIVATE

WM_CLOSE

WM_DESTROY

WM_ENTER(EXIT)SIZEMOVE

WM_GETMINMAXINFO

WM_KEYDOWN

WM_MENUCHAR

WM_SIZE

The DirectXApp Class

This is the main component of all the core components. It keeps all the other systems together. Once again, initialization is mostly automatic. We will soon see how to use the DirectXApp class to create an application of our own.

The private members of the DirectXApp class can be seen as constant variables for the entire application. The only class able to access them, is the befriended Window class. Not even the derived Game class, that we will talk about next, can access or change most of those private members.

The following member variables and functions are private, thus only accessible by the DirectXApp class and its friend class, the Window class.

std::wstring pathToMyDocuments

This wide string holds the location to the My Documents folder.

std::wstring pathToLogFiles

This wide string holds the path to the desired location to store the game log files. The default is: My Documents/bell0bytes/bell0tutorials/Logs/.

std::wstring pathToConfigurationFiles

This wide string holds the path to the desired location of the game configuration files. The default location is: My Documents/bell0bytes/bell0tutorial/Settings/.

bool validConfigurationFile

This boolean member is true if and only if a valid configuration file was present at the moment the application started. If there was no previous configuration file, the DirectXApp creates one with default settings.

bool activeFileLogger

This boolean member is true if and only if the file logger was created successfully. It is used to tell the cleanup functions whether the file logger is available to log errors or not.

bool hasStarted

This boolean member is true if and only if the DirectXApp class was initialized completely. It is used to delay taking certain actions after encountering Windows events while initializing.

Timer* timer

This is a pointer to a high-precision timer encapsulated in the above mentioned Timer class. The timer is automatically created and initialized at the initialization of the DirectXApp.

int fps

This integer holds the current frames per second, it is automatically updated during the game loop, when the frame statistics are computed.

double mspf

This double precision float holds the milliseconds it took to process the current frame. It is automatically updates during the game loop, when the game statistics are computed.

const double dt

This constant double precision float is of utmost importance, as it defines the desired update frequency of the game world. For further details, re-read the tutorial about the game loop. By default, this member is set to $d_t := 4.16 \approx \frac{25}{6}$, which is equivalent to $240$ frames per second.

const double maxSkipFrames

This double precision float makes sure the game world is not updated too often per frame on slow computers. Re-read the tutorial about the game loop for further details. By default this is set to $10$
meaning that as long as the game runs with at least $24$fps, the game world is updated normally, else we break out of the update loop after ten iterations to not stall the entire system. Note that this variable must be set dependently of the above dt variable.

void calculateFrameStatistics()

This private member function is called during the game loop to compute frame statistics, while doing so, it updates the fps and mspf member variables.

bool getPathToMyDocuments()

This private member function is automatically called during initialization to retrieve the path to the My Documents folder. The retrieved path is stored in the pathToMyDocuments member variable. It then creates and stores the two other path variables in the appropriate variables.

void createLoggingService()

This private member function is automatically called during initialization to start the file logging service.

bool checkConfigurationFile()

This private member function is automatically called at initialization to check for a valid configuration file. If no such file can be found, a configuration file with default settings is created.

The following member variables are protected, thus also available to derived classes.

const HINSTANCE appInstance

This is the handle to the actual instance of this application, handed to us by the WinMain function.

const Window* appWindow

This is a pointer to a constant instance of the Window class. The Window instance is automatically created at the initialization of the DirectXApp class.

bool isPaused

This boolean is true if and only if the game is paused.

DirectXApp(HINSTANCE hInstance)

The constructor of the DirectXApp class stored the instance handle of the application and initializes most member variables to their default settings. The actual game initialization must be started manually from a derived class (see below).

~DirectXApp()

The destructor destroys.

virtual util::Expected init()

This protected virtual member function initializes the DirectXApp, and as such, the entire game:

If initialization fails, it returns an Expected with a nasty runtime error inside.

virtual void shutdown(util::Expected* expected = NULL)

This protected virtual function cleans up all the allocated resources. If the application had to quit with an error, the error is printed to the log file, if possible.

virtual void onKeyDown(WPARAM wParam, LPARAM lParam) const

This protected constant virtual functions acts whenever a WM_KEYDOWN message is received, that it, it is invoked by the window procedure function in the Windows class. By default, it ends the application if the ESCAPE key is pressed.

virtual util::Expected run()

This is the heart of any game, the game loop. This functions enters the game loop and iterates until the user desires to quit the game. All the game action happens here:

virtual void update(double dt)

virtual void onResize()

This function is invoked whenever the game window is resized, for now, the function is empty, but eventually it will call on Direct3D to resize the graphics of the game.

virtual void render(double farseer)

This function peeks into the future to render a scene of the game world, re-read the tutorial about the game loop for more details. For now this function is still empty, eventually though it will use Direct3D to render the game world.

bool fileLoggerIsActive() const

This constant function returns true if and only if the file logging service is active.

Putting It All Together

To use the power of the DirectXApp class, we create a derived class, the DirectXGame class.

The DirectXGame Class

To specify and use the DirectXApp class, a derived class must be created, like this:

And that's it, nice, clean and easy. With all this Windows stuff abstracted and encapsulated in nice little classes, in the next batch of tutorials, we will safely explore adding DirectX components to our application. Stay tuned!

Windows allows the storage of more than just the program code in an application. Using resources, Windows allows the combinination of pieces of data with program code, which can then be loaded during runtime by the program itself.

This tutorial will cover two examples of this: Custom icons and custom cursors.

Icon Resources

To work with resources, two files need to be created, an .rc file and a .h file. Thankfully the Visual Studio IDE does all of the dirty work, thus I won't go into any details, which could be looked up in LaMothe's book, if necessary.

To create or load an icon, right-click on "Resource files" in the "Solution Explorer" and then select "Add Resource". Everything else should then be self-explanatory.

I found a lovely image of a barking dog and added it to my project, by the name IDI_BARKING_DOG.

The resource.h looks like this:

// ICONS
#define IDI_BARKING_DOG 101

Now to load an icon into an application, the newly created resouce.h file must be included into the project and then the following two lines in the window creation process must be changed:

HINSTANCE hInst

As always, this is a handle to an instance, in this case, the handle to the module of the executable file that wants to contains the image to be loaded. We simply pass the handle to the instance of our application.

LPCTSTR lpszName

This long pointer to a constant string takes the location of the image to be loaded. The MAKEINTRESOURCE macro converts an integer value to a resource type compatible with the resource-management functions, and can be used in place of a string containing the name of the resource.

UINT uType

This unsigned int defines the type of the image to be loaded; it can be set to one of the following values: IMAGE_BITMAP, IMAGE_CURSOR or IMAGE_ICON.

int cxDesired and int cyDesired

Those parameters define the width and height, in pixels, of the resource to load. We simply set this to use the default size.

UINT fuLoad

This unsigned int defines further behaviour of the icon to be loaded. Check the MSDN for all possible flags, in this tutorial we used LR_DEFAULTCOLOR, which simply means that the icon is not monochromatic, and LR_SHARED, which shares the image handle if the image is loaded multiple times.

Cursor Resources

Loading a custom cursor is equally simple. To create or load a cursor, right-click on "Resource files" in the "Solution Explorer" and then select "Add Resource". Everything else should then be self-explanatory.

For example, I downloaded and loaded two StarCraft related cursor resources. The resource.h file now looks like this:

The mouse was just a tiny piece of a much larger project,
aimed at augmenting human intellect.

Dr. Douglas C. Engelbart

Eventually DirectInput will be responsible to handle user input, yet it is still helpful to learn how to use the Win32 library to access the keyboard and the mouse to handle basic input functionality.

The Keyboard

A keyboard basically consists of a number of keys, a microcontroller and some support electronics. The first IBM PC came with a keyboard that had snap-action switches that gave tactile feedback and made clearly audible clicks when the keys were pressed far enough. Today most cheaper keyboards have keys that simply make mechanical contact when depressed, while more sophisticated keyboards have a magnet under each key that passes through a coil when struck, thus inducing an electromagnetic current that can be detected. This article is being written on a tactile mechanical keyboard with audible clicks.

Depressing a key generates an interrupt which invokes the keyboard interrupt handler on the operating system. This interrupt handler reads a hardware register inside the keyboard controller to get the number of the key that was just pressed. A second interrupt is created once a key is released, this way, depressing the "Shift" key, then depressing and releasing the "X" key, the operating system notices that a capital "X" should be written.

Windows processes this incoming stream of information and sends keyboard event messages to the application window, which the message procedure of the window then has to handle. More precisely, depressing a key on the keyboard generates two sets of data: the scan code and the ASCII code.

The scan code treats the keyboard as a set of different switches: The scan code registers when a single key is depressed, but it holds no information on whether any combination of multiple other keys was depressed as well. Scan codes are handled by the WM_KEYDOWN message.

The ASCII code, on the other hand, gives detailed information on what key-combination was depressed. ASCII codes are handled by the WM_CHAR message.

Continuing the above example, scan codes see no difference between "X" and "Shift" + "X", whereas in ASCII code, depressing "X", results in a small "x" and depressing "Shift"+"X" results in a capital "X".

In the context of computer games, scan codes usually carry all the information needed. The WM_KEYDOWN message has two parameters, wParam, which contains the virtual-key code of the key that was depressed, and lParam, which stores additional information, such as the repeat count, i.e. number of times the keystroke was repeated as a result of the user holding down the key.

There is a similar message sent when a key is released, head over to the MSDN to look it up.

This is all nice and easy, but what if access to the keyboard outside of the main event loop is desired? The answer is the GetAsyncKeyState function. This function determines whether a key is up or down at the time the function is called, and whether the key was depressed after a previous call to GetAsyncKeyState.

GetAsyncKeyState could be used to poll for the status of single keys (whether that is a smart thing to do or not, is another question). Its prototype is as follows:

SHORT WINAPI GetAsyncKeyState(_In_ int vKey);

All that has to be done is to feed the function with a virtual-key code and if the high bit of the return value is $1$, then the key is depressed, otherwise it is not. This way, information about the status of any key on the keyboard can be requested at any time.

Using those functions, keyboard input can be queried for during any moment in the game, for example during the main game loop (again, whether this makes sense or not, is another discussion):

while(...)
{
// peek for messages
// let the timer tick
if (!isPaused)
{
// compute fps
// acquire input
if (isKeyDown(VK_ESCAPE))
break;
// accumulate the elapsed time since the last frame
// now update the game logic with fixed dt as often as possible
// peek into the future and generate the output
}
}

The Mouse

The Console-Hell

The mouse is the most common way of allowing users to point at the computer screen and was the answer to the cries of those computer users that were not computer specialists. Way back, computers of the ENIAC generation were used by the same people that built them, and for a long while, computers were still mostly operated by specialists. Their chosen input: the command line interface! Non-specialists, however, saw those interfaces as downright hostile.

The salvation for us mere mortals came with a patent applied for in 1967, and received in 1970, by Dr. Douglas Carl Engelbart, for a wooden shell with two metal wheels. The patent application described this device as an "X-Y position indicator for a display system". Engelbart himself later said that this device was nicknamed the "mouse" because "the tail came out the end". As is often the case, Mr. Engelbart never received any royalties for his invention. During an interview, he said "SRI patented the mouse, but they really had no idea of its value. Some years later it was learned that they had licensed it to Apple Computer for something like $40,000." He had, however, changed the life of countless people around the globe.

Nowadays a mouse is a small plastic "box" that eagerly awaits for its owners return on the table next to the keyboard. When it is moved around on the table, a little pointer on the screen moves with it, basically allowing the user to point at the screen. Mice also have a variable amount of buttons, I would say naive users prefer one button, since that makes it harder to press the wrong button (Hello Apple)! Gamers usually prefer mice with multiple (surely more than two) buttons to better handle the exciting stress encountered during games.

Let there be LED!

Nowadays we know three different species of mice: the mechanical mouse, the optical mouse and the optomechanical mouse.

Mechanical mice had two perpendicular rubber wheels (now replaced by a ball) protruding through the bottom. On mouse movement parallel to its main axis, one of the wheels turned, on movement perpendicular to its main axis, the other wheel turned. By measuring changes in the resistance, it was possible to determine how much each wheel had rotated and to thus compute how far the mouse had moved in each direction.

Optomechanical mice have a rolling ball, rotating two shafts aligned perpendicularly to each other. The shafts are connected to encoders that have slits through which light can pass, and as the mouse moves, the shafts rotate and light pulses can strike the detectors; obviously then, the amount of pulses detected is proportional to the amount of motion.

The optical mouse has no ball ( and such it is definitely not a suitable mouse for Mr. Kahn). Instead, the first mice of this kind had a Light Emitting Diode, or LED, and a photodetector on the bottom. As the mouse moves across special lattice surfaces, the photodetector senses how many lattice cells have been crossed by seeing the changes in the amount of light being reflected back, which electronics inside the mouse then translate to actual data about the mouse movement.

Modern, now surface-independent, optical mice deploy an optoelectronic sensor which takes successive images of the surface on which the mouse is moved around. Integrating special-purpose image-processing chips into the mouse hardware, enabled the mouse to detect motion on many different surfaces.

The first modern optical mice were introduced in 1999 by Microsoft, the famous IntelliMouse, based on technology developed by Hewlett-Packard. In 2004 Logitech then started another revolution with its MX 1000 laser mouse. Using a small infrared laser instead of an LED significantly increased the resolution of the image taken by the mouse, enabling superior surface tracking compared to LED-mice. By 2009 Logitech introduced laser mice with two lasers, the "Darkfield"-mice, capable of tracking movement even on reflecting surfaces, such as glass.

Mouse Messages

The most common way for a mouse device to send messages is to send a data stream of three bytes each time it is moved at least a minimum distance. The first byte contains information about the movement in the "x-axis", while the second byte tells about movement in the "y-axis". The third byte encodes the current state of the mouse buttons. Low-level software in the computer then collects this information and converts the relative movements sent by the mouse to an absolute position.

There are many Windows messages related to the mouse, for now only two are covered in this tutorial.

WM_XBUTTONY

To test whether a mouse button was pressed (or released), Windows offers messages of the form $X$BUTTON$Y$, where $X \in \{\text{L},\text{M},\text{R}\}$ and $Y \in \{\text{DBCLK}, \text{DOWN}, \text{UP}\}$. Those button messages also have the mouse position encoded in their lParam.

The window procedure would do something like the following to handle mouse message:

The game loop is the heart of every game. It is well known that a healthy heart is the key to longevity. As briefly mentioned in the last tutorial, it is preferable to have the heart beat with fixed frequency, else numerical instability might occur while doing mathematical computations, endangering not only the heart, but the health of the entire body.

We will now discuss four ways to implement a game loop.

Unlimited FPS

The game loop implemented in the last tutorial seemed almost natural: updating the game objects based on the time $\Delta_t$ elapsed between two frames seems ideal. It is featured in many books and online tutorials, so what could possibly go wrong?

Meet the bad guy, my ancient nemesis: numerical analysis! And numerical analysis is so evil, it will turn this beautiful game loop into a non-deterministic hell!

Actually, this game loop has problems both on slow and on fast machines. Think back to the poor student running around campus, trying to avoid the professors.

As the game slows down, the game world will be updated in bigger steps, resulting in rougher animations, and a lower reaction time both for the player and the AI-controlled professors. The difference between $60$fps and $30$fps is $3$ pixels per frame, as the student moves $2$ pixels per frame on $60$fps and $5$ pixels per frame on $30$fps. Those $3$ pixels could make the difference between life and death: Close your eyes, imagine yourself being said student and notice the big tree in front you, only $3$ pixels away, as suddenly, you realize, you are being played by someone using an old computer, only able to run the game on $30$fps and he is just about to press the forward button on the keyboard... Ouch! The survival of the student depends on the heartbeat, or the framerate, of the game.

Okay, so this is bad on slow computers, but what about fast computers? Surely animations are smooth now and the reaction time is ideal. Well, the simple fact that the memory space of computers is limited, leads to the explosion of rounding errors. I will use PARI / GP, a computer algebra system designed for fast computations in number theory, that I am using, among other systems, for my mathematical research, to show what catastrophic damage rounding errors can lead to, even when using floating point arithmetic with very high precision. The effects seen here will be even worse in C++.

Now assume that the poor student is walking around the campus with the average speed of an average human being: \[3 \text{ km/h } = \frac{3 \cdot 1000}{60 \cdot 60} = \frac{5}{6} \approx 0,83333333333333333333333333333333333333 \text{ m/s }. \]

At first, the student is in a part of the campus that the game can run in $60$fps. Watch what happens when the students walks forward in a straight line for ten seconds:

We broke one of the most important rules of numerical analysis: While doing so many additions, the roundoff errors from the non-exact representations of speed and spf exploded on us.

But still, let us continue with the game. After moving around, the student is now at a more quiet place of the campus that the game can handle with $120$fps. Once again, the student moves forward for ten seconds:

So the difference between $60$fps and $120$fps is \[0,0069444444444444444444444444444444450035 \text{ m / s, }\] that is about $0,025$ km/h. Thus a player playing the game on a faster PC ($120$fps) and a player playing on a normal PC ($60$fps) will experience differences in their gameplay, which is simply not acceptable.

Numerical analysis has truly turned this promising game loop into a nightmare: the game is no longer deterministic and just imagine what those roundoff errors could do to the physics engine; or simply ask the U.S. Military about the Patriot System.

Conclusion: The effects of numerical instability can be as subtle as the game feeling slightly different depending on the frame rate or as extreme as objects moving through walls and jumping to outer space. It is utterly unrealistic to expect mathematical stability with a fluid framerate. Thus, even though this game loop is featured in many prominent books about game programming and various famous online tutorials, there can be only one final verdict: Guilty. Game over!

Constant FPS

A first solution would be to simply take a nap if things are going too quickly. To run the game with constant $x$ frames per second, simply sleep for $\frac{1000}{x} - \Delta_t$ milliseconds.

Here is an example for a game loop with constant $60$fps per second, that is, the game has roughly about $17$ms to do all of its work per frame.

What happens if it takes longer than $17$ms to update the game universe? Well then the game sleeps for a negative amount of time! Okay, we could check for that, but really, the game will run slower. If in some parts of the world the computations are more difficult, the game will slow down there considerably and the game will end up with a different heartbeat depending on where the player is. Obviously, when that happens, the graphic quality could be reduced, but who would want that? This is simply not acceptable.

If the hardware is powerful enough and if the algorithms are carefully designed, everything goes well: For example, if it takes $10$ms in total to get the input data, to update the game world and to render the scene, the game sleeps for about $17-10=7$ milliseconds and then continues with the next frame. Obviously this is wasting precious time though, why run a game with a fixed framerate if it could run much faster, after all, the higher the framerate, the smoother the animations. This game loop deliberately keeps state-of-the-art gaming computers down; it is an enemy of science and progress! On a side note though, on laptops or phones this could be useful to not constantly keep the hardware on the edge to save some battery power.

Conclusion: This is a not-so-good idea for a game loop. It is easily implemented and writing a physics engine based on a fixed framerate is easy. On mobile devices it could save battery power. It is problematic on slow machines though, and it hinders scientific progress.

Mathematical Freedom

Thinking about the problems of both game loops above, it becomes clear that we want the best of both worlds. We want $\Delta_t$ to be constant, for numerical stability, as well as unlimited frames per second for rendering. Obviously those two goals are mutually exclusive - unless the design of the game loop is changed. The idea is this: updating the game world, computing the physical simulation, is done with a newly defined constant $\delta_t$, independently of the generation of the output:

At the beginning of each frame, the accumulated time $t_a$ is updated by the time elapsed since the last frame: $t_a = t_a + \Delta_t$. The game is then updated, with a constant $\delta_t$ (equivalent to $120$fps in this example), as often as the accumulated time $t_a$ permits. The key idea to grasp is as follows: generating the output produces time which the update of the game universe, the simulation of the physical reality, consumes in small, discrete, $\delta_t$-sized steps. Note that if $t_a$ is not an integer multiple of $\delta_t$, that is, if $\nexists \alpha \in \mathbb{Z} \, : \, t_a = \alpha \cdot \delta_t$, there might sometimes be some unused time left over in $t_a$ after the game universe is updated. This leftover time is not thrown away, but accumulated.

In the above example, $\delta_t = \frac{1000}{120} = \frac{25}{3}$mspf, and if the game runs with $60$fps, the game world is updated twice per frame. If however the game runs with $240$fps, only half of an update step is done - no, that is obviously nonsense - only one update step is done every two frames: $t_a = \frac{1000}{240} = \frac{25}{6} < \frac{25}{3} = \delta_t$, but obviously $2 \cdot t_a = \delta_t$.

On slow computers, this might lead to very long update loops. For large $\Delta_t$, thus large $t_a$, and small $\delta_t$, the while loop will run for a long time. A safeguard would be another constant variable limiting how often the while loop is executed. Anyway, on slower machines the framerate will obviously drop, but the game itself will still be updated at the desired speed.

On faster computers, this game loop no longer suffers from the numerical instability of the Unlimited FPS game loop, but it still wastes CPU cycles, just as the Constant FPS game loop did.

Conclusion: While this game loop is a significant improvement over the two others, there is still the problem of having to manually define a value for $\delta_t$ that works on slow and fast machines. Furthermore, there is another big problem hiding in the shadows of an uncertain future!

Far Seers

Accumulating the left over time in the previous game loop is not only wasting precious CPU cycles, no, the fading away of time created a nightmarish ghost, hiding in the shadows of an uncertain future, ready to jump on an unprepared programmer at any moment; and when it does, the game world will be rendered at a point between two updates.

It seems obvious that in most cases $\Delta_t$ will not be an integral multiple of $\delta_t$, that means there will almost always be some time left over that can not be simulated. This means that sometimes the game world will be rendered with some updates left to yet be computed, which leads to stuttering.

For example, think about the student moving across the campus. He just found a t-shirt of his favourite hero, Flash, granting him the power of lightning fast speed. What happens when the student zooms horizontally over the campus, from the left edge of the campus to the right edge, at the moment the game renders between two updates. The student should be seen moving towards the middle of the campus. Yet at the first update, the student is at the left edge, then the game world is rendered, with the student still seen at the left edge of the campus, leading to very rugged movements.

Thankfully all the information needed to solve this problem is already available. After updating the game, the time left over is stored in $t_a$, which indicates how far ahead the next rendered frame is. By passing this information in a normalized variable $t_r = \frac{t_a}{\delta_t} \in [0,1)$ ($t_r = 0$ at the previous frame and $t_r \approx 1$ arbitrarily close to the next frame; just think about linearly progressing on a line, or think about barycentric coordinates, if you are so inclined - anyway, at the time $t_r=0$ you are at the left edge of the line, and as $t_r$ increases, you simply move forward, and the closer you get to the right edge of the line, the closer $t_r$ approaches $1$), to the output generating function (i.e. the renderer), it can remedy the situation by updating the position of the student with an educated guess (without needing to know the actual value of $dt$). The rendering function needs to look into the future:

Assume that the renderer knows the position and the velocity of each object in the game. Further assume that the student, with his Flash t-shirt, is moving to the right by $800$ pixels per frame, but still standing at the left edge, $x=0$, at the last update, and that the time left over in the accumulator is $t_a = \frac{25}{6} < \delta_t$, then $t_r = \frac{t_a}{\delta_t} = \frac{1}{2}$, which tells the rendering function to draw the student half a frame ahead of his position after the last, trailing, update, that is, at position $x=\frac{800}{2} = 400$.

Obviously far seeing is always perilous, for example, when the next game update is done, it is discovered that the student actually hit a tree, standing firm at position $x=400$. The position of the student was rendered was based on a guess, and the far seer guessed wrong. The mistake will quickly be discovered and remedied though: The game is updated $120$ frames per second, thus when this happens, the student will only be seen hugging the tree from the inside for the tiniest fraction of a second, surely unnoticeable by human eyesight - and still definitely better than the stuttering occurring without peeking into the future at all.

The analysation of this new game loop in the case of a slow computer is the same as for the previous game loop.

On fast computers, things will run as fast as they possibly can, and with the rendering function peeking into the future, no more stuttering will occur.

Conclusion: While this game loop design might make coding the rendering function a bit more difficult, it is clearly the superior of the four designs shown in this tutorial, thus, from now on, this will be the game loop design used.

Putting It All Together

Here is the new game loop implemented in C++, with $dt:=4.16 \approx \frac{25}{6}$, equivalent to $240$fps and maxSkipFrames set to $10$, meaning that as long as the game runs with at least $24$fps, we update the game world normally, else we break out of the update loop after ten iterations to not stall the entire system.

I changed a few of the members of the DirectXApp class to be private, basically making them constants. Clearly no other parts of the game need to alter those. Most of the variables needed and used by the derived DirectXGame class are still protected:

Wow, we finally got to use some of our mathematical knowledge to explain game related stuff! Okay, to be honest, we talked about physics and the simulation of time and reality, but still, it was fun. There are still many open questions though, such as what happens when VSYNC fails, or how to implement a multithreaded game loop, but those questions are for a later time.

In the next tutorial we will learn how to acquire keyboard and mouse input from Windows.

Art

"I wish it need not have happened in my time," said Frodo.
"So do I," said Gandalf, "and so do all who live to see such times. But that is not for them to decide. All we have to decide is what to do with the time that is given us."

--- J.R.R. Tolkien, The Fellowship of the Ring

A Theoretical Game Loop

As seen in the tutorial about real-world Windows applications, the game loop controls the overall flow of the entire game. In gaming terminology, each iteration of the game loop is called a frame. Many games aim at running at $60$ frames per second, or fps, which means that the game loop of such a game completes $60$ iterations per second.

Our traditional, traditional in the sense that it comes from a world without multithreading, game loop looked like this:

Thus there are three things happening in each frame. This tutorial will give a brief overview over all three of them, but the details will be covered in later tutorials.

Acquiring the Input

Traditional input may come from various sources, such as the mouse, keyboard or joystick. Furthermore, for online based games, another input source is any data received via the internet. Many modern mobile games also feature camera or GPS-information as input.

Updating the Game Universe

Updating the game world means appropriately updating every single object that is active in the current frame, based on the input acquired in the previous step. This might involve heavy mathematical computations to simulate a world with physical laws.

Generating the Output

After having updated the game world, the resulting output must be generated. The most computationally expensive output, in most cases, is the graphical representation of the game objects. Furthermore, audio, such as sound effects and music, but also dialogue, and occasionally, force feedback, or rumble effects, must be generated as well. For online based games, the data sent to the client, the player's computer, would constitute additional output.

Imagine that a student is trying to walk over the University campus without being seen by any of the professors, as he has not refreshed his knowledge on algebraic geometry lately and is terribly afraid of being drawn into a conversation with them. What would such a game loop look like?

While the game is running, there are only two classes of objects that must be updated in each frame: the player object is updated based on acquired keyboard data and the professors are updated based on the position of the player - just imagine evil mathematicians trying to start a conversation about mathematics with everyone on sight (yes, a highly unrealistic scenario).

After having acquired the keyboard input, the game updates the position of the player. Should the position of the player now coincide with the position of a professor, the student faints. Else, the AI controlled professors adjust their position based on the new position of the player, with the goal of intercepting him.

Once the player and professor objects are updated, the rest of the campus must be updated as well (time of the day, trees, lights, ...). Finally the resulting output is drawn to the scene and the game plays some appropriate audio.

This is rather straightforward, and while many mobile and Indie games might still use this traditional game loop pattern, modern games have evolved out of sheer necessity. With the amounts of objects per scene ever increasing and the graphical detail becoming more and more photo-realistic, displaying $60$ frames per second is extremely difficult. Imagining that it takes both $20$ milliseconds to update a large game universe and to render the updated world, with this traditional game loop, using only a single thread, the resulting frames per second would be unacceptably low, namely $\frac{1000}{20+20} = 25$fps.
Modern computers are multicore CPUs and in order to achieve maximal efficiency, a game would be wise to make use of all the cores of the computer it is running on. If updating the game world and rendering the scene could be completed in parallel, on different cores, the resulting frames per second would at least still be $\frac{1000}{20} = 40$fps. Obviously there are many many things to consider when trying to implement a parallel game loop, for example, how can a scene be updated and drawn at the same time? It is a rather advanced topic and won't be featured until way later.

We will discuss an actual implementation of a single-threaded game loop in the next tutorial.

Time

In games, just as in real life, unfortunately, it is always important to keep track of time progression. For example, to correctly animate game characters or to simulate the rules of physics, the time that passes between frames must be known exactly. As games usually run at a high frame rate, it is important that time measurement is very accurate, for example, a game running at $60$fps has only $\frac{50}{3}$ms, or roughly about $17$ms, to complete each frame iteration.

One thing to consider is that the time passed in the real world might not coincide with the time passed in the game world, for example, games might be paused (and surely time does not flow in a paused game universe). In other cases, game designers might alter the flow of time deliberately, such as to implement slow-motion or, oppositely, to reduce the length of a half-time in a football (real football, as in European football - not that it matters) game, as obviously the players, or most of the players, do not want to sit through a $90$ minute session for each game of the season.

The High-Precision Timer implemented in this tutorial will make keeping track of time seem like child's play.

Game Logic and Delta Time

While keeping track of the total elapsed time since the game was started, the time elapsed between two frames, the so-called delta time, or $\Delta_t$, is equally, if not even more, important.

Imagine that in the game from above, the student is trying to just run over the campus in a straight horizontal line:

// move the student two pixels to the right
student.position.x += 2;

The problem with this absolute movement is that it depends on the fps the game is running on. If the game runs on $30$fps, then the student moves $30 * 2 = 60$ pixels per second. If the game runs on $60$fps, the student moves $60 * 2 = 120$ pixels per second. Obviously such a dependence on the frames per second is not desirable. The simple solution is to update game objects based on $\Delta_t$, that is, movement should not be thought of as absolute, but as relative to the elapsed time. If the desired movement speed was $120$ pixels per second, it would be preferable to update the position, $x = x + 120 \cdot \Delta_t$, like this:

// move the student to the right by 120 pixels per second
student.position.x += 120 * deltaTime;

This new code works as desired, independent of the frames per second. At $30$fps, or $\Delta_t = \frac{1}{30}$s, the student will move $120 \cdot \Delta_t = 4$ pixels per frame, for a total of $4 \cdot 30 = 120$ pixels per second. At $60$fps, or $\Delta_t = \frac{1}{60}$s, the player will only move $120 \cdot \Delta_t = 2$ pixels per frame, but in total, he will move $60 \cdot 2 = 120$ pixels per second once again. Thus movement will be smoother when the game runs with $60$ frames per second, but the overall speed per second will be identical.

This example shows that it is a good idea to update most game objects based on the elapsed time between two frames, especially when movement is involved, that is, if a translation, or a rotation, is applied to a game object. When higher mathematics or physics are evolved, a constant rate of update becomes even more important, as will be seen in the next tutorial.

The High-Precision Timer implemented later in this tutorial has the computation of $\Delta_t$ as a main feature.

Another question to ponder is whether to limit the frames per second of a game, or to simply let it run as fast as possibly possible. Usually allowing the game to run with as many frames as possible causes all kinds of problems. Most of those problems stem from the instability of numerical analysis, but that is the topic of the next tutorial.

A High-Precision Timer

Thankfully Windows offers a high performance timer, or performance counter, out of the box, which seems to be way better than the C++11-timer available through std::chrono.

The QueryPerfomanceFrequency retrieves the frequency of the performance counter, that is, the counts per second of the performance timer. This value is fixed at system boot and is consistent across all processors. Therefore, the frequency need only be queried once upon application initialization. Here is an example taken from the MSDN again:

LARGE_INTEGER frequency;
QueryPerformanceFrequency(&frequency);

Obviously, to get the seconds per count, it is enough to compute the reciprocal of the number of counts per second. Thus, assuming $y$ to be the time value in counts and $z$ the frequency of the performance counter, then the time value in seconds $x$ can be computed by $x = y \cdot z^{-1}$.

This is very straightforward and this little bit of information is already enough to create a high-precision timer for a game. Note that both functions return $0$ if an error occurred.

Time Between Two Frames

As discussed above, to animate game objects, it is very important to know the exact time that has elapsed between frames. Let $t_i$ and $t_{i+1}$ denote the count value at frame $i$ and $i+1$ respectively, then the time elapsed between those two frames can easily be computed by $\Delta t = t_{i+1} - t_i$.

Total Time

When trying to keep track of the total time $t_{total}$ that has elapsed since the start of the game $t_{start}$, it is important to stop the timer when the game becomes inactive. Stopping the timer is done by recording the count value $t_{paused}$ at the moment the game is paused. When the game becomes active again, the time the game was idle is computed as follows: If the game is resumed at count value $t_{resumed}$, then the game was idle for a count value of $$t_{idle} = t_{resumed} - t_{paused}.$$ Each time the game is resumed again, $t_{idle}$ is added to the total time $t_{totalIdle}$ the game has been idle so far since the start of the application.

Now to get the total running time $t_{total}$, in counts, at any moment in the game, there are two possible situations:

Inquiry when the game is paused: $$t_{total} = (t_{paused} - t_{start}) - t_{totalIdle},$$ where $t_{paused}$ holds the count value at the time the game was paused last.

Inquiry when the game is running: $$t_{total} = (t_{now} - t_{start}) -t_{totalIdle},$$ where $t_{now}$ is the count value at the moment of the inquiry.

Please note that all computations are done in $(\mathbb{Z},+)$, which is obviously associative, thus the parentheses above are only used to highlight the idea that, to get the total time the game was running, the total idle time must be subtracted from the total time the game has existed.

Here are two numerical examples.

Suppose that the game started at $t=0$, that the game was paused between $t=10$ and $t=20$ and again at $t=30$. Now if the total time is requested while the game is still paused, the above formula yields: $$t_{total} = (30-0) - 10 = 30-10 = 20,$$ which makes sense, since the game ran for $10$ counts twice, between $t=0$ and $t=10$ as well as $t=20$ and $t=30$.

Suppose that the game started at $t=5$, that again the game was paused between $t=10$ and $t=20$ and additionally between $t=30$ and $t=50$. Then $t_{totalIdle} = 30$, and if the total time is requested at $t=70$, the above formula yields: $$t_{total} = (70-5)-30 = 65-30 = 35,$$ which again makes sense as the game ran for $5$ counts between $t=5$ and $t=10$, for $10$ counts between $t=20$ and $t=30$, and for $20$ counts between $t=50$ and $t=70$, which leads to a total running time of $5+10+15=35$ counts.

Timer Class

Now with the theory out of the way, it is time to implement the actual timer class.

class Timer
{
private:
// times measured in counts
long long int startTime; // time at the start of the application
long long int totalIdleTime; // total time the game was idle
long long int pausedTime; // time at the moment the game was paused last
long long int currentTime; // stores the current time; i.e. time at the current frame
long long int previousTime; // stores the time at the last inquiry before current; i.e. time at the previous frame
// times measured in seconds
double secondsPerCount; // reciprocal of the frequency, computed once at the initialization of the class
double deltaTime; // time between two frames, updated during the game loop
// state of the timer
bool isStopped; // true iff the timer is stopped
public:
// constructor
Timer();
// getters: return time measured in seconds
double getTotalTime() const; // returns the total time the game has been running (minus paused time)
double getDeltaTime() const; // returns the time between two frames
// methods
util::Expected<void> start(); // starts the timer, called each time the game is unpaused
util::Expected<void> reset(); // sets the counter to zero, called once before message loop
util::Expected<void> tick(); // called every frame, lets the time tick
util::Expected<void> stop(); // called when the game is paused
};

The times, measured in counts, match those described above, with slightly different names.

Constructor

Timer::Timer() : startTime(0), totalIdleTime(0), pausedTime(0), currentTime(0), previousTime(0), secondsPerCount(0.0), deltaTime(0.0), isStopped(false)
{
// get the frequency of the PerformanceCounter
long long int frequency = 0;
if (QueryPerformanceFrequency((LARGE_INTEGER*)&frequency))
{
// compute the secondsPerCount as the reciprocal of the frequency
secondsPerCount = 1.0 / (double)frequency;
}
else
// the hardware does not support a high-precision timer -> throw an error
throw std::runtime_error("The hardware does not support a high-precision timer!");
}

The constructor initializes most member variables to $0$, queries for the performance frequency and then computes the amount of seconds per count, by the formula from above, and stores that value in the appropriate secondsPerCount member variable. A return value of $0$ from QueryPerformanceFrequency is a very strong indicator that the hardware does not support a high-precision timer. It would be possible to then use the C++11-chrono class, but for now the constructor simply throws an exception.

Start, Reset and Stop

Those three functions very much behave like an ordinary stopwatch.

Start

util::Expected<void> Timer::start()
{
// this function starts the timer (if it is not already running)
if (isStopped)
{
long long int now = 0;
if (QueryPerformanceCounter((LARGE_INTEGER*)&now))
{
// add the duration of the pause to the total idle time
totalIdleTime += (now - pausedTime);
// set the previous time to the current time
previousTime = now;
// reset the pausedTime to 0 and isStopped to false
pausedTime = 0;
isStopped = false;
// return success
return { };
}
else
// unable to query the performance counter, throw an error
return std::runtime_error("Unable to query the performance counter!");
}
// return success
return { };
}

The Timer::start function starts the timer (crazy stuff, I know). If the timer was already running, then nothings happens. Else - if the game was paused - the function queries for the current time $t_{now}$ and, as discussed above, the duration of the pause, $t_{now} - t_{pausedTime}$, is added to the total time $t_{totalIdle}$ the game was idle so far.

Since at the next frame the current frame will be the previous frame, the previous time is set to "now". Then the last time the game was paused is set to $0$ and the isStopped flag is reset to false.

If there was an error with the performance timer, the function returns a runtime_error.

The Timer::stop function stops the timer (yeah, even more crazy stuff!). If the timer is already stopped, then nothing happens. Else - if the timer is running - the function queries for the current time, stores the returned value as the time the game was paused and sets the isStopped flag to true.

If there was an error with the performance timer, the function returns a runtime_error.

The Timer::reset function resets the timer (okay, no more stupid jokes, I promise). It queries for the current time and sets both the starting time of the application as well as the time of the previous frame to the returned value from the query. Then it sets the time the game was last paused to $0$ and resets the isStopped flag to false. The Timer::reset function must be invoked once at the start of the game loop.

On encountering an error, the function returns a runtime_error.

Time between frames

To keep track of the elapsed time $\Delta_t$ between two frames, \[\Delta_t = t_{currentTime} - t_{previousTime}\] is constantly updated during the game loop using the Timer::tick function:

util::Expected<void> Timer::tick()
{
// this function lets the timer tick, i.e. it computes the time that has elapsed between two frames
if (isStopped)
{
// if the game is stopped, the elapsed time is obviously 0
deltaTime = 0.0;
// return success
return { };
}
else
{
// get the current time
if (QueryPerformanceCounter((LARGE_INTEGER*)&currentTime))
{
// compute the time elapsed since the previous frame
deltaTime = (currentTime - previousTime) * secondsPerCount;
// set previousTime to crrentTime, as in the next tick, this frame will be the previous frame
previousTime = currentTime;
// deltaTime can be negative if the processor goes idle for example
if (deltaTime < 0.0)
deltaTime = 0.0;
// return success
return { };
}
else
// unable to query the performance counter, throw an error
return std::runtime_error("Unable to query the performance counter!");
}
}

Note that $\Delta_t$ might be negative if the processor goes into power save mode, if the running process is moved to another processor, or simply if an overflow occurs. If such is the case, $\Delta_t$ is set to $0$.

To retrieve $\Delta_t$, use the Timer::getDelta function:

double Timer::getDeltaTime() const
{
// this function returns the time elapsed between two frames; delta time is updated during the game loop
return deltaTime;
}

Total Time

The Timer::getTotalTime function returns the total time elapsed since the start of the application, minus the total idle time, as discussed above.

To use the new timer in a game, the DirectXApp class is updated with members to hold a pointer to an instance of the timer class, the frames shown per second and the milliseconds elapsed per frame, as well as methods to calculate those frame statistics and to update the game objects based on the elapsed time.

Obviously, for now, the DirectXApp::update function does absolutely nothing, but it will definitely be useful later on.

To compute the frames per second, the DirectXApp::calculateFrameStats function updates the number of frames it has seen since the start of the game each time it is called during the game loop using a static variable. Using a second static variable, it keeps track of the elapsed time since it was last evoked. The lifetime of such static variables begins the first time they are encountered and it only ends as the program terminates. Live long and prosper.

Now then, once per second, the fps variable can simply be set to the number of frames counted in this way, and the milliseconds it took to render a frame, on average, is $1000$ divided by the frames per second.

Finally, the only thing left to do is to update the message procedure function in the windows class to actually start and stop the timer whenever the game is paused or unpaused, for now, that means, whenever the game window becomes inactive.

Exercises

Exercise 1

Which events must be updated in the message procedure of the windows class to appropriately start and stop the timer?

Exercise 2

What happens if in the DirectXApp::init() function the window is created before the timer?

Exercise 3

Change the source code in such a way, that the current fps will always be shown as caption of the window. (Hint: Use the SetWindowText function.)

Exercise 4

In the next tutorial we will discuss timing in game loops. Can you figure out what is wrong with the game loop from this tutorial?

In the next tutorial we will discuss different designs for game loops, in particular we will show why the game loop used in this tutorial, which also features in many prominent books and other online tutorials, leads to very unstable and non-deterministic behaviour!

Art

She knows herself to be at the mercy of events, and she knows by now that events have no mercy.

--- Margaret Atwood, The Blind Assassin

Handling Windows events

As discussed in previous tutorials, Windows is an event-based operating system and even though a DirectX application usually does not need Windows messages to get its job done, it still needs to handle some events:

This is the most important event to handle. This message is sent when a menu is active and the user presses a key that does not correspond to any mnemonic or accelerator key.

Since the beep when a non-mnemonic key is pressed is terribly annoying, and since usually games do not work with menus, we beg Windows to please, please not beep at us by deceiving the system by telling it that the "close the menu" button was actually pressed. This will be extremely useful when switching to a full-screen application, as hearing that annoying beep each time I pressed "alt+enter" surely had a detrimental affect on my physical well-being! Leave now and never come back!

WA_ACTIVE: the window has been activated by some means other than the mouse (i.e. keyboard).

WA_INACTIVE: the window is being deactivated.

The fMinimized variable indicates whether the window was minimized and hwndPrevious is the handle to the window being activated or deactivated.

This message is useful, since it allows to keep track of the state of the game window. When the window becomes inactive, the game should be paused; after all, the hero should not be chomped on by some monsters after the player minimized the window to check his favourite math blog! When, and if, the window is activated again, the game should resume. To assure this functionality, a boolean has been added to the DirectXApp class, called isPaused, which is true if and only if the application is paused.

Obviously, when the window becomes deactivated, the isPaused is set to true, and when the window becomes active (again), this variable is set to false. In the main game loop isPaused is used to concede resources to the CPU while the game is inactive, i.e. the game simply does nothing while paused:

This is another very important message for games to handle, as it gets invoked each time the user resizes the game window. Each time that happens, the game universe must be scaled to fit the new dimensions of the client area. For now, it is enough to register the size changes. Later, when DirectX is up and running as well, Direct3D will be used to do the actual resizing.

For this purpose there is a new function in the DirectXApp class, the onResize function, which gets called each time resizing occured; it does nothing yet though, besides printing a message to the log file.

To keep track of its state, the Window class uses three new boolean members, namely isMinimized, isMaximized and isResizing. They are true if and only the window is minimized, maximized respectively being resized.

Note that while the window is being dragged around (else if (isResizing)) the game is not being resized. It would actually be completely useless to do so, as while the window is being dragged, Windows continuously sends WM_SIZE messages: it would be extremely slow and trivially pointless to respond to all of them (and to constantly change the game graphics), thus it seems much wiser to simply wait until the dragging is done and to only then do all the resizing that must be done.

To notice whether the window is being resized, the following messages must be handled:

If the user drags the edges of the window (to make it smaller or larger) a WM_ENTERSIZEMOVE message is sent once the dragging starts and a WM_EXITSIZEMOVE once the dragging is done (and the window is resized).

In this case, it is advised to respond exactly as above. Please note that while the window is being dragged around, the application is paused.

case WM_ENTERSIZEMOVE:
// the game window is being dragged around, pause the game
isResizing = true;
dxApp->isPaused = true;
return 0;
case WM_EXITSIZEMOVE:
// the game window is no longer being dragged around, set the flag to pause, resize the graphics and unpause the game
isResizing = false;
dxApp->onResize();
dxApp->isPaused = false;
return 0;

This message is sent when the window is about to be changed in size. The lParam parameter of this message holds a pointer to a MINMAXINFO structure. The minimum tracking width (x member) and the minimum tracking height (y member) of the window are stored in ptMinTrackSize. By setting those values to 200 each, it can be assured that the window can not be resized to be smaller than 200 x 200.

Exercises

Exercise 1

Explain line number 2 in the above log file.

Exercise 2

Change the program code to open more than one window, more precisely, create and open a window of each one of those colours: white, black, gray, light gray and dark gray, and figure out a way to close one window at a time and to only exit the program once every single window is closed.

And then Bill Gates said

there be Windows everywhere!

By now I am really excited; we covered a lot of ground, we learned about error handling, multitasking and thread-safe logging. We learned quite a bit about the architecture of Windows and we can define, register and create our own windows as well as write our own event handlers. Furthermore, we had first contact with the powerful scripting language Lua. All in all I think we have done an excellent job so far.

In the next tutorial we will learn how to add a high precision timer to our application, which will prove invaluable for using actual physics in a game. We will also come back to an earlier discussion about how to implement a robust game loop.

Photos

We choose to go to the moon in this decade and do the other things, not because they are easy, but because they are hard.

--- John F. Kennedy

To be able to read in configuration files and to run scripts later on, we will learn how to use Lua. This tutorial is just an introduction, but when writing code to drive gameplay, it is becoming more and more common to use a scripting language.

Lua is Portuguese for moon. Lua is also a powerful, yet lightweight, embeddable scripting language, created in 1993 by a team of computer scientists at the Pontifical Catholic University of Rio de Janeiro and it so happens to be the most popular general-purpose scripting language used in games today.

It is dynamically typed, runs by interpreting bytecode with a register-based virtual machine and has automatic memory management with incremental garbage collection, making it ideal for configuration and scripting.

A famous game using Lua is World of Warcraft by Blizzard. What makes Lua so interesting for games is that Lua is the fastest language in the realm of interpreted scripting languages. Furthermore, adding Lua to an application does not bloat it; the reference C implementation is only about $150$KB in memory.

Okay, so Lua is a fast scripting language engine with small footprint that can be embedded easily into our game. So let us do just that.

Installation

To start, just download the x64 binaries from the Lua website and unpack the contents into a meaningful directory, for example P:\Lua\x64.

From now on we will configure our game as a $64$-bit application. Open the Properties window of the project and go to the VC++ Directories tab. Make sure to edit the properties for the x64 platform and for all configurations. The Include Directories, Reference Directories and Library Directories must be edited. First, add P:\Lua\x64\include to the Include Directories, then, add P:\Lua\x64 to the two other directories.

Once done, make sure to always compile the project for the $x64$ platform.

Including the Lua library to our project is now as easy as this:

// Lua
#include <lua.hpp>
#pragma comment(lib, "liblua53.a")

To avoid the .dll hell, simply copy the lua53.dll into the working directory of your project. To check if Lua is working, try this minimal example:

Installation

In later tutorials, we will learn how to fully harness the power of Lua and Sol, but for now, we will be contempt with simply reading the game settings, in this case, the desired screen resolution, from a configuration file on the disk.

Reading a Configuration File

As a first demonstration of the power of the moon and the sun, we will create a window with the desired resolution read in from the configuration file bell0prefs.lua in the MyDocuments/bell0bytes/bell0tutorials/Settings folder.

Cleaning Up

To hide as much of the dirty work from the actual game as possible, the creation of the logger service was moved to the DirectXApp class.
In addition, since now also configuration files have to be read from the disk, the DirectXApp stores the path the My Documents folder, as well as to the Log and Settings folders of this tutorial. And obviously the logger constructor won't duplicate that code.

I won't bother you with the details of all the changes, but please do not be alarmed when noticing that the source code for this tutorial has changed quite a bit from the last tutorial. All of the initialization is now handled by the DirectXApp class:

The checkConfigurationFile function checks for a valid configuration file: If the file is not found or if it is empty, the file is created with default settings (resolution: 800 x 600). If an error is encountered, the validConfigurationFile flag is set to false, telling other game components, so far there is only the Window class, to start with default settings without even trying to read from the configuration file.

Utility

One thing that is a bit of a hassle is that Sol, or Lua, expect a std::string as input for the name of a file to open, thus a StringConverter class is needed:

For details on how these functions work, please look here. We will see how to make use of them in just a moment.

The Power of the Sun

Now with all of that out of the way, it is finally time to create a window with a given size instead of CW_USEDEFAULT. For that purpose the Window class now has two member variables to store the desired screen resolution and a new private function to load those variables from a file:

It was promised that Lua and Sol are lightweight and easy to use, and behold, those promises were true.

sol::state lua

The most important Sol class is probably the sol::state_view. This structure takes a pointer to an already existing lua_State and enables simple access to the Lua interface.

sol::state derives from sol::state_view, inheriting all of the functionality, but it has the additional purpose of creating a fresh pointer to a lua_State and managing its lifetime by itself. Thank you, Sun - you truly are the enabler of life on Earth!

lua.script_file(const std::string &)

This function simply opens the file specified by the std::string. This is where the new StringConverter class comes into play; we can now easily convert between std::string and std::wstring. (Since Windows 10 uses Unicode natively, it is wise to keep using wide strings in our project.)

The lua.script_file function throws errors, thus in the actual code it is in a try-block. In this case, if the opening of the configuration file fails, the application starts with its default settings.

Now the true power lies in how easy it truly is to read in the variables.

Here is the file Lua is supposed to read from:

config =
{
resolution = { width = 800, height = 600 }
}

The sol::state behaves exactly like a table. The syntax to get nested variables from the configuration file, is the same as for accessing multidimensional arrays:

The get_or function either returns the value read from the configuration file (i.e., the value stored under "config", then "resolution", then "width", which is 800), or the specified value if something went wrong.

Actually, the table is the only complex data type supported by Lua, but it can be used in a variety of ways, as an array, a multidimensional array (as seen above), or even as a dictionary:

Now this is all pretty easy, yet powerful - thus quite exciting! But before we can create a window with the desired resolution, there is one last thing to consider:

Window Size versus Client Size

When working with windows, it is important to know the exact dimensions of the area that can be tampered with. It is clear that if, for example, a request is made for a 800 x 600 window to be created, the actual working area won't be 800 x 600. The window size is 800 x 600, yes, but the actual client size, the client area being the portion of the window without its borders, won't be. Luckily, the window size need not be computed by hand - the AdjustWindowRect function does this for us:

The first parameter is a long pointer to a rectangle structure that contains the coordinates of the top-left and bottom-right corners of the desired client area. When the function returns, the structure contains the coordinates of the top-left and bottom-right corners of the window to accommodate the desired client area.

The second parameter defines the style of the window whose required size is to be calculated.

The third parameter indicates whether the window has a menu or not.

The last parameter defines the extended window style of the window whose required size is to be calculated.

The return value of the function is a bool.

Here is an example to compute the window size of an overlapped window, with no menu, and with the desired client area read from the Lua configuration file:

The desired with and height are stored in the clientWidth and clientHeight member variables and the AdjustWindowRect function then calculates the required window size. This information, the width being rect.right - rect.left, and the height being rect.bottom - rect.top, is then passed to the CreateWindowEx function to create a window with the desired client size.

Putting It All Together

As already mentioned, all of the initialization is now done by the DirectXApp class, thus the new WinMain is only concerned about actually doing game related stuff:

int WINAPI WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nShowCmd)
{
// create and initialize the game
DirectXGame game(hInstance);
util::Expected<void> gameInitialization = game.init();
// if the initialization was successful, run the game, else, try to clean up and exit the application
if (gameInitialization.wasSuccessful())
{
// initialization was successful -> run the game
util::Expected<int> returnValue = game.run();
// clean up after the game has ended
game.shutdown(&(util::Expected<void>)returnValue);
// gracefully return
return returnValue.get();
}
else
{
// a critical error occured during initialization, try to clean up and to print information about the error
game.shutdown(&gameInitialization);
// humbly return with an error
return -1;
}
}

Please have a look at the source code to notice the change from the last tutorial.

Exercise

Play around with the configuration file to create windows of different sizes. Can you feel the power of being able to change the configuration of your application without having to recompile everything?!