Raytracing and acceleration structures

This is going to be a series of posts about raytracing, acceleration structures and, eventually, photon mapping on the GPU - a project I’ve been working on for a while in my spare time. I’ve finally decided to put the current state of this project out into the wild, most of which is an implementation based on “Real Time KD-Tree Construction on Graphics Hardware” by Zhou et al.

The following is mostly an introductory article for all those around me who are wondering what I’m so excited about, for more details on the current state of the project and the github link check out this post. Further down is a nice video, though.

and transforming it into a nice looking picture (created with Blender):

The data is a list of triangles in 3D space that form a 3D model of stuff, a dragon in this case. The triangles are encoded by their vertex positions and edge information. Combined with additional information like positions, brightness and colors of light sources and materials that describe the interaction of light with the model raytracing is capable of producing photo realistic images like the one above.

What we perceive with our eyes in the real world are a bunch (actually a gazillion) of photons falling onto our retina. These photons bounced around the world getting reflected or scattered on surfaces and “changing” their wavelength and polarization. Raytracing simulates and approximates light falling into a virtual camera (our eye) after having interacted with objects, but the process of gathering the light information is reversed. Of all those photons flying around, only a very small amount of them in fact makes it through our small pupils. So instead of simulating all possible paths of light being emitted from light sources, rays are shot out of the camera into the world and each ray is tested for an intersection with an object that might or might not be illuminated.

At each intersection, the distance and direction to all possible light sources is computed and, given the properties of the material, its translucency, reflectance, etc., the brightness and color of light along the ray back to the camera is determined. Shading can be directly obtained from the angle between the intersection - light direction and the surface normal. With this idea, a whole lot of effects can be simulated by sending out secondary rays from the light source or intersection points: mirror reflections, refraction, shadows, projections and much more. To arrive at the final image, a plane is placed in front of the camera and intersections of the light rays with this plane give pixel positions and colors, composing the projection of the 3D scene on a 2D plane.

Ok, so we know what to do: take 3D triangle data, send out some rays, check for each ray if it intersects with any triangle, compute the light being bounced back, and we are done. If we want reflections and refractions, shoot some more rays from the intersection points, which can be implemented recursively. This is in fact so simple, there exists an implementation that fits on a business card (scroll down to “Minimal ray tracer”).

Not so fast, space partitioning is faster

There is one drawback however: the naive approach requires plenty of computation time, as every ray has to be tested for intersection with every single triangle in the scene. In the vast majority of cases the tested triangle is nowhere even near the ray. Even worse, testing for triangle intersection requires expensive dot and cross products, checking if the found triangle is occluded, etc., just to find 99.9% of the time: nope, no intersection, next triangle, please. Certainly, we can do better than that, right?

That’s where acceleration data structures come into play.

The basic idea is simple: subdivide the volume containing a soup of triangles to be rendered into smaller boxes and sort triangles into them. Testing for ray-triangle intersection then is a two-step process: traverse the boxes along the ray and if the current box is not empty, check for intersection with the triangles contained in the current box only. If no triangle was hit, we’ll move on to the next box. This technique is especially effective if the full bounding volume can be partitioned such that empty and non-empty space is well-separated and each box contains only a few triangles. In the former case, no ray-triangle intersection testing has to be performed at all once a hit with an empty box was determined. How’s that for a speed-up?

There are of course many ways to organize the space partitioning and different data structures have been developed for this task (Octrees, R-Trees, etc.). In the following we will focus on the so-called k-d tree using axis aligned bounding boxes, a binary tree representation of the space subdivision. Each node is associated with a splitting plane defined by a normal vector along one of the main spatial axes. The left and right child nodes represent the bounding volumes to the left and right of this plane, respectively. Constructing the tree top-down, each level of the tree subdivides the initial volume into smaller chunks and keeps track of triangles contained in them.

A basic k-d tree implementation is quite straight-forward and simple, just have a look at the minimal a dozen of lines or so Python required for the construction of a kd-tree storing point clouds, only a little bit more effort is required for triangle soups.

K-d tree construction and traversal on the GPU

Using a k-d tree speeds up raytracing by orders of magnitude on a single CPU system, beautiful pictures can be rendered in minutes. But what if we’d like even more speed? Maybe so much speed that we could do raytracing in real time?

Before even thinking of the raytracing component, one has to think about the acceleration structure, the k-d tree. If the scene we want to render is static and all triangles composing the scene are fixed in space, the tree has to be constructed only once and one doesn’t have to worry about performance too much. On the other hand, if any part of the scene changes or moves, the tree has to be reconstructed. Since this could happen every frame, the tree building better be quick.

Parallelization to the rescue! Not double or quad core CPUs, massive paralellization using the GPU.

It turns out, however, things become not that as simple as in the dozen lines of code above anymore, if we strive for efficiency. We have to worry about using the best parallelized algorithms and respect the restrictions of the hardware. The tree construction has to be split into fragments, that can be treated independently without exchange of information between processes.

Thankfully, as with most awesome things, this work has already been done. “Real time KD-Tree Construction on Graphics Hardware” describes an efficient algorithm for k-d tree construction. Following the algorithm outlined in the paper and filling in the missing pieces, I’ve jumped into the implementation.

So far, I have one half of the algorithm working, creating a tree containing a given number of triangles in its leaves after cutting off empty space and performing median splitting.

Implementing this little beast turns out to be quite more involved than it looks like at first, more details about the approach in another post. Until then, above is the Stanford Dragon (200k triangles) rendered with the resulting node bounding boxes of the tree in Blender, terminating the construction at 2048, 1024, 512, 256, 128 and 64 triangles per node, respectively. It’s very helpful to have a visual representation for debugging purposes (and there are some inconsistencies in the empty space cutting apparent).