FuzzyPhoton

How to write a Raytracer

To write a good raytracer, you need one main thing -- a lot of patience. In fact, this goes for all large computer programs. All rendering engines need a lot of work on substructure and basic functions. This means, you'll be spending a lot of time initially just getting your vector, matrix and maths libraries shipshape. At least this is what I experienced. So let's explore how you might go about getting your raytracer up and running. I'll be following a rough chronological sequence, according to the bottom-up paradigm (which means, start from the bottom, i.e. the basic routines, and work up to the more complex and broad ones. This is opposed to the top-down paradigm). I'll also assume that you're pretty serious about your project -- it is after all possible to patch together a basic sphere tracer in half-an-hour in Visual Basic, but that doesn't have very much long-term appeal. Further, the language for all examples will be C++. I would personally recommend an object-oriented approach for designing your raytracer, although you'll probably feel the need to put in a few procedural sections here and there. Try to minimize the latter, though it is rarely completely avoidable.

At the very fundamental level, you need a lot of vector, matrix and maths functions. A basic knowledge of vector/matrix algebra and the theory of equations is essential here. If you don't fully grasp the idea of, say, a cross-product or an inverse matrix, you'd better go look it up. This is knowledge that will help you throughout your study of computer graphics.

Let's look at the libraries one by one:

The Maths Library

First and foremost, your maths library should contain some routine for handling floating point division overflows. It's crazy how many times code seems to crash for no reason at all, and then you discover that you've got a "Division by Zero" exception. A neat function which solves this problem is:

This function uses the operation a + b to test for overflow. If, say, a floating point number has 10 decimal places and the numerator is more than 10 orders of magnitude higher than the denominator, then their sum will be equal to the numerator. By suitably multiplying the numerator or denominator by some power of 10, you can vary the magnitude difference condition.

Now you need some equation solving code. The highest degree of equations you need to solve depends upon what primitive shapes you're using. If you're only using planes (i.e. polygons), you need to solve linear equations. This can be implemented as part of the plane intersection code and need not be a separate function. Spheres will require solutions of quadratic equations, torii will require solutions of quartic equations and general polynomials will require solutions of, well, general polynomial equations of any degree. Some code for solving linear, quadratic, cubic and quartic equations is given below (Update Oct '16: the code below has had some recent small bugfixes and may not be fully battletested. Please report any further bugs you encounter to me. Also consider other resources such as http://math.ivanovo.ac.ru/dalgebra/Khashin/poly/index.html.):

Of course, you can probably think of many ways to optimise these functions or make them more robust, e.g. by writing a modified cubic solver that directly returns an appropriate root to use in calc_root_of_quartic.

For polynomials of degree 5 and higher, algebraic methods fail, and we have to use numerical methods. A common method is the Newton-Raphson method, but this depends on a judicious choice of the "seed" point. A more failproof method is the bisection method, which takes an interval which must contain the root (say 0 to 10000), decides whether the root lies in the upper or lower half of this interval, and continues the same process with the half containing the root. We thus slowly close in on the root as our intervals get smaller and smaller, and about 20 iterations or so are enough to give us an interval so small that we can take its midpoint as the root. To decide which half contains the root (and how many roots there are) we can use the Sturm sequence, which should be described in a text on equation theory (or look up the POV-Ray source code for a good practical implementation).

In addition to these, you may need to write a number of small functions to suit your needs. For instance, functions that calculate the maximum/minimum of two or three numbers, the binomial coefficients or list matrix index transformations are often very useful.

The Vector Library

A vector can usually be represented as a structure of 3 floating point numbers -- x, y and z. It can be used to store positions or directions. You'll need functions to calculate the sum, difference, dot (scalar) product and cross (vector) product of two vectors, the product of a vector and a scalar, and the negative, magnitude (modulus) and unit vector of a vector. A mag2 function is frequently added to calculate the square of the magnitude of a vector. In C++, it is a good idea to use operator overloading -- you'll be able to write shorter and cleaner code.

In the rest of this article we'll assume +, - and * have been overloaded for addition and subtraction of vectors and multiplication of a vector by a scalar respectively. This applies to colour vectors (i.e. RGB triples) as well.

The Matrix Library

This is written on the same lines as the vector library. 3D graphics systems generally use 4x4 matrices for storing transformations. So a 2D array like mat[4][4] is recommended. You'll need functions to calculate the sum, difference and product of two matrices, the product of a matrix and a scalar, and the determinant and inverse of a matrix. A function to generate the 4x4 identity matrix is also useful. Further, you'll need some code to multiply matrices and vectors. Note that the products are different depending upon whether you treat the vector as a position or a direction. The code for the two cases is given below:

You're advised to keep the matrix and its precalculated inverse in a transformation class. You'll have to handle 3 types of transformations -- positions, ray directions and normals. The transformation code will look like:

To perform the inverse transformations, just swap the trans_m's and inv_m's in the above code, i.e. replace each matrix with its inverse.

Now you'll need to write some functions to generate the standard transformation matrices, for translation, rotation and scaling (and possibly shearing). The forms of these matrices can be looked up in the Reference, or elsewhere on the Net or in any computational geometry or computer graphics textbook, so we needn't elaborate on them here.

The Colour Library

This is easy to write after you've written the vector library. Just replace all the x, y, z's with r, g, b's (red, green, blue components), delete the meaningless functions like dot and cross products and you've got a set of routines for manipulating colours.

Other Objects

You will also need structures to model rays (two vectors, a point and a direction), intersections (position, and a pointer to the intersected primitive) etc.

Shapes, or primitives, are the basic modelling entities of any renderer. Almost all raytracers handle spheres. Many handle polygons (usually triangles), cylinders, cones, torii etc. And some handle more complicated shapes like meshes, spline surfaces and density maps.

The Golden Rule here is: GENERALIZE. Don't have different interfaces for different shapes. Try to see which functions are common to all shapes. For instance, all shapes will have transform, intersect and get_appearance (returns appearance properties) functions, and almost all will have a normal function. In an object-oriented language, use inheritance. Design a base class to abstract the interface and derive all your primitives from this base class, utilising polymorphism. This approach will maximise the extensibility and elegance of your application. To see what can be done to mimic OOP in a non-OOP language, check out the POV-Ray source code (written in C).

I cannot advise you too much about what data and methods your shapes should have, but try to strike a balance between space and efficiency considerations. Don't keep the shape data so minimal that you have to recalculate a lot of unchanging values at runtime, but equally don't use so much redundant data that you clog up memory unnecessarily. Don't pay attention to people who tell you that computers are so fast and have so much memory nowadays that you don't have to worry about tight code. Good programming always aims at making the fullest use of resources. After all, if you start out writing code for a 486 and then somebody gives you a Core i7, you don't relax coding standards and end up with the same performance on both -- you'd want your program to run 500 times as fast on the i7, that's what it's for. In any case, if you've ever had to wait 16 hours for a render to complete, you'll want to write the most efficient code possible :-)

To give you an idea of what a shape class might look like, here's the sphere class:

Again, you'll find yourself automatically optimising and extending this code as you integrate it into your program (which I do NOT advise you to do mechanically. Try understanding the purpose of various parts of the algorithm and write your own code).

Till now we've discussed how you should go about writing the geometry-oriented parts of your program. But no picture is going to look lifelike without shades and colours and shadows. To put these in, you must write texturing and lighting functions. Something of the nature of a texture class is suggested. Such a class could be a base class for a wide range of textures, such as single colour, solid (3D) texture, plane (2D) textures etc. Textures could incorporate noise and turbulence functions, fractal patterns, user-specified bitmaps -- in fact, any pattern you can think of. You'd probably want to get everything up and running before you start off writing all your fancy pattern functions, so just write the basic texture class and define the interface (principally, a get_appearance function that returns appearance properties (colour, transparency etc.) at a point). You might need to add functions to your shapes that return texture space (U-V) parameters to help in mapping 2D images to surfaces.

Lighting is another essential facet of your program. Start off with point lights -- these are the most widely used, the easiest to handle, and any other light can be approximated as a set of point lights. A point light should have a position, a colour and an intensity. Area lights can be represented as an array of point lights. Complex beamed lights like cylindrical lights or spotlight have a source position, a direction and a fall-off value which determines how narrow the beam is. Lights should have a function intensity_at(vector & pt) which returns the intensity at the point pt.

A very effective trick for increasing surface realism is to use bump mapping. This technique perturbs the surface normal by a small amount according to a predefined procedure to create the illusion of bumps and ridges on the surface. You could create the illusion of rough wood or flowing water or cobbled roads this way. To integrate bump-mapping into your code, you could add a new normal function in the shape classes, called, say, warp_normal, to return a normal vector modified using a bump map. You can't modify normal directly because you need the true geometric normal to determine visibility.

Finally, you need a shader class. Such a class will again be a base class for deriving various shaders. Shaders are routines that combine appearance, geometry and lighting data to generate the perceived colour of a surface. The basic shader is the ambient-lambertian-specular (Phong) shader that uses an empirical model to generate perceived colours. Sample code is given below. The variable names should be self-explanatory (shadow contains the factors by which the colour components are attenuated. For instance, if a grey semi-transparent surface shadows the surface, shadow will contain the values (0.5, 0.5, 0.5)). mult_c returns the colour vector obtained by multiplying corresponding components of its two argument colours, i.e. mult_c(c1, c2) = (c1.r * c2.r, c1.g * c2.g, c1.b * c2.b)

Now we are going to put everything together and create a working raytracer. I can't present all the code here -- it'd be far too long. A good substitute is the generic Raytracing Algorithm, which gives an overview of everything you'll need to do get a rudimentary raytracer up and running.

The raytracing procedure is recursive, i.e. it calls itself. As soon as a ray hits a surface, reflected and transmitted rays are generated and these are traced through the scene. You must keep some sort of check on the level of recursion, else you'll end up with out-of-memory errors. I recommend the following three checks at each step

1. If the ray intersects no objects, break out.

2. If the level of recursion is greater than the maximum trace depth, break out.

3. If the ray intensity is smaller than some small predefined value, break out.

Note that for simplicity, the last check is not present in my version of the generic algorithm: work out a way of putting it in yourself :-)

The intersect function of the sample shape structure I have given above returns a list of distances to various intersection points on the object. To choose the closest point and see whether it is closer than the nearest intersection obtained previously, this is what you do (isec is an intersection object containing the distance to the nearest object obtained so far and a pointer to this object, and dis is a distance_list object containing the various intersection distances to the object being tested):

The use of the EPSILON value needs some explanation here. If this value was replaced with 0 (standard algorithm), then we may have the closest intersection as the start point of the ray, which also lies on a surface being tested. To avoid this situation, we only consider intersections at a small distance EPSILON from the start point. (Take some time to think this out if it seems confusing. Also see the Reference).

Some discussion of the shadow testing algorithm also seems necessary. The basic theory behind the creation of shadows (ignoring refraction effects) is the same as that of alpha-channeling -- opaque objects completely block light coming from behind them, transparent objects do not block light and translucent objects partially block light. So if the light passes through three white surfaces that are each 50% transparent, then the intensity of the emergent light is (50/100 * 50/100 * 50/100) * 100% = 12.5% of the original intensity. To test if a point P is in shadow with respect to a light source at position L, we consider the intersections of each object in the scene with a light ray from L to P (or P to L, it doesn't matter). If any of the intersections lie between P and L, we attenuate the intensity of the light by the transparency of the surface at that intersection point. The shadow attenuation factor (say between 0 and 1) can be calculated separately and passed to the shader. In the above example, it would be 0.125. Actually, the shadow value is a colour vector, containing separate attenuation factors for each of the colour components. So if there is a red transparent surface blocking the light, the shadow value would be RGB(1, 0, 0), i.e. the red component of the light is transmitted while the green and blue components are totally eliminated.

Not much can be said about input routines for raytracers. Many raytracers parse a "scene description file" to generate a model of the scene. Such a file is written in a scripting language which can access the basic modelling functions of the raytracer. Other raytracers have an interface which allows the user to visually design the scene: the scene is then stored in an internal encoded format. Some raytracers have no such script or visual interface -- they are packaged as libraries which can be accessed by executable programs. This means that you'll have to write programs of your own in some high-level language to create images. This is actually a fairly convenient and powerful approach -- the programmer saves himself/herself the bother of writing an elaborate interface and the user can utilize every feature of the library in minute detail as well as extend the library freely.

Raytracers usually output pictures in some lossless, high colour-depth format. Targa (TGA) and PNG are frequently used. On the Windows platform, the Bitmap (BMP) format is popular. Here is the structure of a 24-bit BMP file (the values to be assigned are given as comments):

First, a 14 byte "file header" block with the following structure (always assuming a short int is 2 bytes long and a long is 4 bytes long):

size is the size of the info header block in bytes. We assume our picture is stored as an object called pic, with the properties width and height in pixels. extra_bytes is a slightly esoteric value which is necessary because for some peculiar reason, the width of a BMP file (in bytes) must be an exact multiple of 4. extra_bytes is calculated as:

extra_bytes = (4 - (width * 3) % 4) % 4;

Thus each line (i.e. row) of our picture must be padded with extra_bytes bytes (which may contain rubbish) to bring the length of the row upto a multiple of 4.

Finally we have the actual picture stored as a sequence of RGB triplets, each component stored as a byte value between 0 and 255. Here again we encounter a peculiarity of the BMP format -- the picture is stored bottom-up, i.e. the first row stored is the lowermost one. For the pixel at (row, col), the corresponding RGB triplet starts from the nth byte of the file, where n is given by

Note that row can vary from 0 to pic.height - 1 and col can vary from 0 to pic.width - 1. The triplet is again stored in reverse order: the nth byte stores the blue value, the (n + 1)th byte stores the green value and the (n + 2)th byte stores the red value.

Be warned that many compilers store structures in blocks whose sizes are multiples of 2, 4, or 8 bytes, not 1. This is done for processing efficiency. Thus for the 8-byte case, the file header structure (14 bytes) would be stored in a 16 byte block. If you tried copying the file header structure into a file using the normal C file write routine, specifying the starting address and the number of bytes to copy (14), you'd probably end up copying two wrong bytes and your file would be unreadable. So either set the block size increment to 1 byte or copy the structure element by element.

Hopefully by now you have a well-written and fast raytracer up and running. I use the word "hopefully" with a large amount of optimism, because I know from experience that even when you think you've written perfect code, an unbelievable number of bugs crop up from nowhere. So you'll probably be spending a lot of time debugging now. Good luck.

There's nothing quite like the fun of seeing the first picture correctly rendered by your own program. In my case, I produced a highly uninteresting display of a shaded sphere in 16 shades of grey (which is all you got out of the Borland BGI drivers) and felt incredibly kicked. Remember, though, that the production of the first picture doesn't mean that your program is finished. You'll have to test it, optimize it, extend it, modify it. For instance, you should definitely consider putting in some space subdivision or bounding volume scheme. Often you'll find your own sense of aesthetics prompting you to restructure large chunks of code. In the process of writing your raytracer, you'll have picked up all sorts of maths and physics knowledge that you might use elsewhere. This is one of the best side-benefits. Also, remember that there's nothing better than breaking a few conventions here and there. Don't be afraid to experiment, but yes, always keep a backup copy ;-)