The Swap Chain

The GPU contains in its memory a pointer to a buffer of pixels encoding the image currently being displayed on the screen. When asked to render a scene, the GPU updates this buffer and sends the new information to the monitor to display. The monitor then redraws the screen from top to bottom, replacing the old image with the new one.

However, there is a slight problem with this in that the monitor does not refresh as fast as needed for real-time rendering. To understand this, we must understand how computer monitors work. As an example, we will have a brief look at Liquid Crystal Displays, or LCDs.

Liquid Crystal Displays

Liquid crystals are organic molecules that flow like a liquid while retaining their lattice structures like solid crystals. They were first discovered in 1880 by the Austrian botanist and chemist Friedrich Reinitzer and first applied to small displays in the 1960s. Using electric fields, the alignment of the molecules can be controlled, and thus the optical properties can be changed.

An LCD consists of liquid crystal sealed in between two parallel glass plates with electrodes attached to them. In the widely used active matrix displays, there is a switch at each pixel position on one of the electrodes; by turning those switches on and off, it is possible to create an arbitrary voltage pattern across the entire screen, allowing arbitrary bit patterns. Those switches are often called thin film transistors, and the monitors using this technology are widely known as TFT displays.

A light source, originally cold cathodes, but in modern screens usually light-emitting diodes (LEDs save power and allow for even thinner displays - see here to learn more about LEDs), illuminate the screen from the back, and the alignment of the crystals determines the visual output at the front.

Obviously this setup only provides monochromatic images, the idea behind coloured images is the same, but the technological details are a lot more complicated.

If this brief introduction peeked your interest and you want to learn more about LEDs from professionals, check out this great article from Scotlight Direct. Established in 1996, Scotlight has over thirty years of experience in the commercial and retail lighting industry. I am sure they can teach us a thing or two, for example, about how LEDs are contributing to a "greener planet".

Video Random Access Memory and Color Palettes

The image on the display is taken from a pixel buffer, a special memory inside the GPU, called video random access memory, or VRAM for short. For example, in a $1920 \times 1080$ resolution, the VRAM would contain an array of $1920 \cdot 1080$ values, one value for each pixel to represent. Those values might simply be a $3$-byte RGB value, representing the red, green and blue component of each pixel, thus requiring $6,2208$MB of VRAM. In the olden days, this was a lot of space, thus a little trick was used: the desired colour of each pixel was stored in a 8-bit number, used as an index to retrieve the actual colour in a hardware table, the color palette. This reduced the the VRAM requirement by $\frac{2}{3}$, but only allowed $256$ colours to be active on the screen at once. Famous Blizzard games, such as Diablo II and Starcraft used this technique successfully, but nowadays VRAM isn't as limited anymore.

The above mentioned example used the $24$-bit, or true colour mode, which uses exactly one byte per colour, giving access to $256^3 = 16777216$ different colours. Unfortunately, this makes for ugly addressing and many video cards actually do not support $24$-bit colours, they use $32$-bit colours instead.

In $32$-bit color mode, the data for each pixel is stored in a format using eight bits for alpha, or the transparency information, and eight bits for each colour. So basically, this format is a variant of the true colour format in which the additional eight bits are allocated to hold transparency or other information. In the previous tutorial, Direct3D was created to support the BGRA-colour mode (blue, green, red, alpha).

Memory Bandwith and Accelerated Graphics Ports

Another problem was the limited bandwith from the CPU to the VRAM, for example, copying $6$MB for at least $60$ frames per second, a total data rate of $360$MB per second is necessary. This is way above what any older computer could handle. Thankfully, with the invent of the Pentium II, and the Accelerated Graphics Port, or AGP, this problem was solved as well. Already the very first such port could transfer data at a rate of $252$MB per second, and nowadays the numbers are even more impressive, with AGP3 allowing transfers at $2133$MB per second.

Okay, so, we have enough VRAM and the bandwith between the CPU and the VRAM is very fast, what then is the problem?

Refresh Rates

Most displays are refreshed between 60 and 100 times per second. In the International System of Units (SI), the refresh rate, or the frequency of those refreshes, is denoted by Hertz, or Hz, named after the German physicist Heinrich Hertz.

To refresh the screen, in oldschool CRT monitors, an electron gun, firing streams of electrons, is moved horizontally accross the screen. The gun starts drawing at the top-left corner of the screen and shifts to the right horizontally to draw the first so-called scanline. It then repositions itself at the left edge of the next scanline to start drawing again. This process is repeated until all scanlines have been drawn.

Once the drawing is complete, the electron gun is positioned at the bottom-right edge of the screen. The time it takes for the electrical gun to move back to its original position at the top-left corner of the screen is called the vertical blank interval, or VBLANK, for short.

While LCDs work somewhat differently, the basic idea of the VBLANK is still very helpful.

Now suppose that the electron gun is halfway done with its job of redrawing the screen, when our application requests for a new output to be drawn immediately: The new images would only be drawn in the bottom half of the screen, while the top half would still show the old images. This effect of the screen showing parts of two different frames at the same time is called screen tearing.

One solution might be to only update the game data during the VBLANK, but obviously modern games take longer to compute updates to the game world than it takes for an electron gun to diagonally race accross the screen once. A possible solution is the so-called double buffering technique which DirectX implements using a swap chain.

The Swap Chain

To avoid screen tearing, most computer animation is achieved by drawing each frame of animation in an offscreen buffer area, called the backbuffer, and then quickly copying the offscreen image to the visible surface. This is called blitting. As long as the copying is done quickly enough, no screen tearing is visible. This process of drawing an image in the backbuffer and then copying it to the actual display surface is the above mentioned technique of double buffering.

However, blitting could still cause screen tearing, because the image transfer could theoretically still take longer than the VBLANK. To help with that problem, DirectX, or more precisely, DXGI, also implements a feature called swapping, or page flipping, that does just what the name says, it swaps the backbuffer and the display surface: DirectX uses a pointer for each buffer and simply switches their values.

Obviously, to fully prevent screen tearing, the swap must happen during the VBLANK; and the faster the buffer flipping, the more time we have to update our game world.

In this tutorial we will implement two back buffers (as VRAM isn't as restricted nowadays):

This setup is the so called swap chain, as it is a chain of buffers, swapping positions each time a new frame is rendered.

Creating the Swap Chain

In DirectX, the swap chain is represented by a new COM interface, the IDXGISwapChain.

To actually create the swap chain, three steps must be completed:

The swap chain must be customized by filling out a swap chain description structure.

A pointer to a DXGI Factory must be obtained; an object true to its name, as it is capable of creating other DXGI objects.

UINT Width and UINT Height

Those unsigned integer values indicate the width and height of the swap-chain buffers in pixels. If set to zero, the swap chain automatically sizes itself to the current resolution of the active window.

This rational number describes the refresh rate in Hertz. For example if we want to run with constant 60 fps, we set this to $\frac{60}{1}$. There is a problem with this, however, if the refresh rate does not fit with the actual display mode: DirectX then first has to create a new buffer, with the correct resolution, or a close match, which wastes resources and time. Setting this value to \frac{0}{1} tells DirectX to not check whether the refresh rates fits the refresh rate of the current display.

This member describes the display format to use. See the above explanation of the color palette for more details. There are plenty of options to select from. In this tutorial we will use DXGI_FORMAT_B8G8R8A8_UNORM, which means that we reserve 8 bits for blue, 8 bits for green, 8 bits for red, and 8 bits for transparency, in that order, and each colour will be stored in an unsigned normalized integer, which is optimized for GPU reading.

This member describes the scanline drawing mode, i.e. the method the raster uses to create images on a surface. We will use DXGI_MODE_SCANLINE_ORDER_UNSPECIFIED which just means that the scan line order will be unspecified.

Those flags indicate how images are to be stretched to fit the backbuffer resolution. We will use DXGI_MODE_SCALING_UNSPECIFIED, which means that our rendered images will just appear in the top-left corner of the window. Also, since in later tutorials, we will cover going into fullscreen mode and want to make sure that we do not initiate a mode change when transitioning toto full screen, we are advised to use DXGI_MODE_SCALING_UNSPECIFIED anyway.

Count is the number of multisamples per pixel and Quality is the image quality level, the exact specifications depend on the GPU. We will talk more about this in a later tutorial, for now, we will disable multisampling by using the default sampler mode, with no anti-aliasing, with a count of $1$ and a quality level of $0$.

This member describes the surface usage and CPU access options for the back buffer. The back buffer can be used for shader input or render-target output. We will obviously use DXGI_USAGE_RENDER_TARGET_OUTPUT for now, which tells DirectX to use the back buffer as an output render target.

UINT BufferCount

This member sets the number of buffers in the swap chain, including the front buffer. For now we will create a front buffer and two back buffers, thus we set this to three.

HWND OutputWindow

An HWND handle to the output window, we will set this to the handle of the main window.

BOOL Windowed

This member is a boolean value that specifies whether the output is in windowed mode. Microsoft recommends that swap chains be created as windowed swap chain and switched to fullscreen afterwards, if so desired. For now, we will stay in windowed mode; fullscreen applications will be considered in a later tutorial.

Those flags describe options for handling the contents of the presentation buffer after presenting a surface, i.e. it tells DXGI what to do with the buffers once they have been shown and are no longer of use. We will use DXGI_SWAP_EFFECT_FLIP_DISCARD, to specify the flip presentation model and to specify that DXGI discard the contents of the back buffer after it was presented to the scene. Please note that in windowed mode, DXGI will still blit instead of flip.

These flags describe further options for the behaviour of the swap chain. For now we will use DXGI_SWAP_CHAIN_FLAG_ALLOW_MODE_SWITCH, which allows an application to switch between display modes. We will talk more about that in a later tutorial.

Wow, that was a lot of information to cover, but thankfully we do not need to be concerned about all the details just yet. Here is the actual code to set up the swap chain description:

Since in the last tutorial we created the Direct3D device without a swap chain, we need to retrieve the factory that was used to create the device in order to actually create a swap chain now. From the Direct3D device an IDXGIDevice can be requested using the As function. To finally retrieve the factory that created the Direct3D device, we use GetAdapter followed by a call to GetParent.

Swapping!

With the swap chain created, it can be used to draw, or present, the actual game scene to the screen. This is done using the Present function:

HRESULT Present(
UINT SyncInterval,
UINT Flags
);

The SyncInterval integer specifies how to synchronize presentation of a frame with the VBLANK. In flip mode, the possible values are $n=0$, to tell DirectX to cancel the remaining time on the previously presented frame and discard this frame if a newer frame is queued, or $n=1$ to $n=4$, to tell DirectX to synchronize the presentation after the $n$-th vertical blank. We will use $0$ for now, accepting that some screen tearing might occur.

The Flags specify various options to present the scene, we will DXGI_PRESENT_DO_NOT_WAIT, which tells DXGI to not sleep or wait for v-sync. Please note that present returns with the DXGI_ERROR_WAS_STILL_DRAWING if the calling thread is blocked.

Resizing

There is one more thing to worry about. When the size of the window changes, the back buffers must be resized as well. This is done using the ResizeBuffers function: