From Ken Turkowski, a sad piece of news: Pierre Bezier died on November 25,
1999.

Peter Shirley is writing a book on ray tracing. The title is "Realistic Ray
Tracing", ISBN 1-56881-110-1, from A.K.Peters, http://www.akpeters.com, and
looks to run 200 pages, $35. The book is not out yet, but appears to be near
completion. I look forward to it, as it's by a master of the subject who is
also an excellent teacher.

This issue of the RT News has two excellent longer pieces. Vlastimil Havran presents an
overview of all research (that he could get his hands on) done on using
octrees in ray tracing, then compares the more promising schemes against each
other. John Stone gives his insights and guidelines for writing a parallel
ray tracer on modern architectures.

I appreciate Vlastimil's work from the standpoint of attempting to bring more of
an engineering approach to the study of efficiency schemes. Instead of
presenting yet another variant scheme and testing it against some basic
algorithm, he compares and contrasts algorithms from the literature and truly
attempts to find what works and what does not. Evolving architectures can
invalidate such results over time, but this practical research is a valuable
starting place for anyone implementing octrees. I also admire his courage in
giving his honest opinions on how significant the various papers are,
something that is extremely useful but rarely done in the field of computer
graphics.

John Stone's article discusses what to watch out for when attempting to
parallelize code, and how to design from the ground up for an interactive ray
tracer, vs. batch mode code. He also points out how changing architectures
affect code in sometimes unexpected ways. Note that John has source code and
related papers at his site.

Per Christensen turned a brief email he sent me some months ago into a
full-fledged article on the use of importance in ray tracing (or lack thereof).
This technique can result in noticeable savings (especially when transparent
objects are involved) and his extension to using it per channel should make it
all that more effective in saving computation time.

Of course, if you're a researcher you will probably ignore this issue of the
RT News until January 12th, the SIGGRAPH 2000 deadline. It's also the
deadline for another critical, life-changing event: it's the last day you can
submit your Fantasy Graphics League team,
http://www.realtimerendering.com/fgl/. The winner gets a lifetime
subscription to the Ray Tracing News and many other valuable prizes, honors,
titles, distinctions, and related privileges thereto, as soon as we can
figure out what these might be.

The puzzle from last issue was: given an axis-aligned cube with corners at
(0,0,0) and (1,1,1), cut it with the three planes x=y, y=z, and x=z. How many
pieces is the cube cut into?

The answer: 6, not 8. What is interesting about this one is that there are
any number of ways of thinking about it. One is simply to note that the three
cutting planes all share the line running through (0,0,0)-(1,1,1), so only 6
pieces could be formed. Another solution is to look at the cube down this
axis, i.e. view it from say (3,3,3). The cube has a hexagonal silhouette, and
from this view the three cuts slice it like a pie. I like this visualization
as it shows that (literally) a different view of the problem makes the answer
obvious. A third solution presentation thinks about the equations: for x=z,
the cube is divided into a part where x>z and one where x<z; same for the
other two equations. 2*2*2 would be 8 pieces, but two pieces cannot exist:
x<y<z<x and x>y>z>x, which leaves 6 pieces that can exist.

The next puzzle is one from "The Mathemagician and Pied Puzzler" by Berlekamp
and Rodgers, http://www.akpeters.com/berlekamp2.html, an homage to Martin
Gardner. You have a cube and you select at random three (different) corners.
What is the chance that the triangle formed by these corners is acute (all
angle < 90 degrees)? is a right triangle (has one angle == 90 degrees)?

I've written a fur/hair shader (unfortunately not renderman yet) based on
'Rendering Fur with Three Dimensional Textures' by James T.Kajiya and Timothy
L.Kay, California Institute of Technology, Computer Graphics, Volume 23,
Number 3, July 1989:

I recently completed my PhD at the MIT Graphics Group under the supervision
of Profs. Julie Dorsey and Seth Teller. The title of my thesis is: "Radiance
Interpolants for Interactive Scene Editing and Ray Tracing" The abstract is
included below. The thesis is now available at:

Ray tracers are usually regarded as off-line rendering algorithms that are
too slow for interactive use. This thesis introduces techniques to accelerate
ray tracing and to support interactive editing of ray-traced scenes. These
techniques should be useful in many applications, such as architectural
walk-throughs, modeling, and games, and will enhance both interactive and
batch rendering.

This thesis introduces radiance interpolants: radiance samples that can be
used to rapidly approximate radiance with bounded approximation error.
Radiance interpolants capture object-space, ray-space, image-space and
temporal coherence in the radiance function. New algorithms are presented
that efficiently, accurately and conservatively bound approximation error.

The interpolant ray tracer is a novel renderer that uses radiance
interpolants to accelerate both primary operations of a ray tracer: shading
and visibility determination. Shading is accelerated by quadrilinearly
interpolating the radiance samples associated with a radiance interpolant.
Determination of the visible object at each pixel is accelerated by
reprojecting interpolants as the user's viewpoint changes. A fast scan-line
algorithm then achieves high performance without sacrificing image quality.
For a smoothly varying viewpoint, the combination of lazily sampled
interpolants and reprojection substantially accelerates the ray tracer.
Additionally, an efficient cache management algorithm keeps the memory
footprint of the system small with negligible overhead.

The interpolant ray tracer is the first accelerated ray tracer that
reconstructs radiance from sparse samples while bounding error
conservatively. The system controls error by adaptively sampling at
discontinuities and radiance non-linearities. Because the error introduced by
interpolation does not exceed a user-specified bound, the user can trade
performance for quality.

The interpolant ray tracer also supports interactive scene editing with
incremental rendering; it is the first incremental ray tracer to support both
object manipulation and changes to the viewpoint. A new hierarchical data
structure, called the ray segment tree, tracks the dependencies of radiance
interpolants on regions of world space. When the scene is edited, affected
interpolants are rapidly identified and updated by traversing these ray
segment trees.

Algorithm details are relatively scarce, but you might glean something from
their page http://www.art-render.com/technology/ar250.html. There was a paper
on this at the hardware conference that took place concurrently with SIGGRAPH
99; no, I do not have a copy.

________

Announcing Steve's Object Builder Ver 1.4

Steve's Object Builder is a free script based tool that can be used to make
3D objects for ray tracing programs such as POV-Ray. The tool uses Perl
(Practical Extraction and Report Language) as its script language
interpreter. One nice thing about Perl is that it is easy to use and it is
freely available on many platforms such as UNIX, Windows, and Macintosh. You
do not have to be a expert Perl programmer to use this tool.

[A modeller in Perl? It's more an aid for doing procedural modeling, with a
number of constructs for you to control. The basic mode of operation is to
define a 2D contour and extrude it in some way. Since it's procedural, things
like L-systems can be done relatively easily. - EAH]

Panorama is a GNU project framework for 3D graphics production. The site has
not been updated in the past few months, but the code appears to have a
number of more advanced effects, including volumetric lighting and an
implementation of the cellular texture basis function:

We used the ray tracing functionality of BMRT (http://www.bmrt.org) on A
Bug's Life, for 15 shots totalling around 30-40 seconds. Of course, the way
the "ray server" works is that BMRT was only used for the refraction rays
(and what you see "through" them), the rest of the scene was rendered with
PRMan, as usual [http://www.bmrt.org/bmrtdoc/rayserver.html].

>Does anyone here know how BMRT's radiosity works? I hear it's the most
>accurate algorithm developed so far, apart from reverse raytracing and the
>like.

I've never heard that. BMRT just uses a fairly standard finite element,
progressive refinement radiosity, then substitutes the indirect component of
the radiosity pass for the ambient() illumination in the ray tracing pass.
It's really not all that sophisticated.

[Small renderings are free; an interesting concept, though it's not clear
from the web page why you'd want to do this. The normal advantage of this
sort of thing is that the server is much more powerful than your client
machine, but the server available is not discussed here. - EAH]

________

If you enjoy links to new ray traced images or want to announce your own
works, consider subscribing to the Raytraced Mailing List:

The SIGGRAPH bibliography searcher,
http://www.siggraph.org/publications/bibliography/, is a wonderful tool which
has been upgraded this year. It does not provide you with lists of papers on
particular subjects, though. Pointers to some human-generated lists on
particular topics can be found on the ACM TOG page
http://www.acm.org/pubs/tog/BibLook.html. I welcome additions to the focussed
bibliographies listed here (ACM TOG can also provide such bibliographies a
home, if you do not have web space).

This renderer is interesting in that it uses the trick (presented in a
technical sketch at SIGGRAPH 99 by Abe Megahed, who runs Hypercosm) of
reflecting rays at the vertices of polygons and sampling the reflection
direction. The reflection color is then added to the local shade for the
vertex and the resulting color blended as usual using Gouraud shading. It
gives a rough, soft, sometimes funky reflection at the cost of just a few
rays, allowing real-time reflection approximation. The same technique can be
used for rough shadows; the major criteria affecting quality is the
underlying mesh and its resolution on the screen.

________

Ken Joy's On-Line Computer Graphics Notes and Geometric Modeling Notes pages
have some valuable tutorials on a wide range of subjects. These are at:

[John has one of the physically fastest ray tracing systems on the planet,
see the end of the Introduction in RTNv11n1. -EAH]

Basic Parallelism Issues:

It is well known that ray tracing is a highly parallelizable process. A naive
ray tracer can be trivially parallelized by splitting up the image into
chunks which are divided up evenly among processors, with each processor
computing its own pixels entirely independently of the others. This simple
approach will go a long ways in making a naive ray tracer run faster. With a
more complex ray tracing system there are several trade-offs that have to be
made:

The granularity of the parallel decomposition (size of the work chunks)
has a direct effect on the amount of load imbalance among the CPUs. The
larger the chunks, the more imbalanced the CPUs will tend to be. The
smaller the chunks, the more balanced the CPUs loads will be. A "pixel
cyclic" decomposition (each CPU traces every Nth pixel in the image)
yields the best load balance.

The granularity of the parallel decomposition also affects the spatial
coherence of consecutively traced rays on a CPU. Larger sized chunks of
work yield good ray coherence. The "pixel cyclic" decomposition yields
almost no ray coherence. Some classic ray tracing acceleration schemes
depend on ray coherence in order to function well (e.g. shadow cache)
Ray coherence has an additional side effect of promoting accesses to
already in-cache data structures, yielding better performance than random
access patterns.

The way that computed pixels are stored in memory can have a direct
effect on communications performance when gathering pixels together into
a final image buffer. When a CPU's finished pixels are stored in a
contiguous array, they are very easily transferred to their destination
using a minimal amount of communications, and no requirement for "packing
and unpacking" that data at source or destination. When groups of
finished pixels are not in contiguous memory, they have to be sent using
multiple communications calls, or by packing/unpacking at source and
destination. When pixels are stored in a strided manner, packing and
unpacking operations are necessary at both source and destination.

Since CPU speeds have been increasing substantially every year but memory
bandwidth and latency have not kept apace, floating point operation counts
are no longer the dominant issue in writing high performance rendering
software. This is especially true since great progress has been made in
development of ray tracing efficiency schemes. Modern CPUs are highly
pipelined, often having tens of instructions in-flight at any given time. As
a result, it is becoming more important to avoid causing CPU pipeline stalls
than to reduce overall operation count. Typical causes of such stalls are
memory load/store operations, and compare/branch operations.

Items 1, 2, and 3 from the above each have a relationship with the CPU issues
in the previous paragraph. Balancing these issues is a major part of writing
a fast ray tracer on parallel/multiprocessor computers. What combination of
1/2/3 is best depends on what goals a ray tracer is attempting to meet. They
are vastly different for a real-time oriented ray tracer than they are a
normal ray tracer.

The Challenges of Multithreading:

Although the parallelism issues involved in ray tracing are well known, the
vast majority of the work done in parallelizing ray tracing as been done on
distributed memory parallel computers. Shared memory parallel computers offer
a number of advantages over distributed memory machines, but they also present
a number of challenges which need to be overcome in order to fully capitalize
on these advantages.

Some advantages of multithreading and shared memory hardware:

90% of the data structures in a typical ray tracer can be shared by all
threads/processors, yielding a tremendous savings in memory. Most notable
candidates for sharing are the scene database, texture maps, and
volumetric data. (Most modern distributed memory ray tracers do not
partition the scene database across nodes, due to the high communications
costs and complexity involved in passing rays or objects between nodes.
Most implement by keeping an entire copy of the scene on each node.)

CPUs can synchronize and communicate with each other at tremendous
speeds. Typically these speeds are order(s) of magnitude faster than the
message passing hardware in distributed memory machines. It is easier to
parallelize code to very fine granularity on most shared memory machines.
This allows a ray tracer to extract parallelism that is difficult or
impossible to squeeze out of a typical distributed memory machine.

Complications involved in multithreading and shared memory:

Most ray tracing algorithms were designed with sequential, non-reentrant
implementations in mind. Any algorithm can be adapted to yield correct
results in a multithreaded environment, but some perform very well, and
others perform very poorly. The main issue which determines performance
in a multithreaded environment is the way an algorithm uses memory,
especially memory which is shared by multiple threads.

The actual implementation issues vary considerably, including:

The use of unprotected static/global data. When multiple threads
access global/static data without the use of mutual exclusion
locks, the values of these data items become scrambled. Adding
mutex locks to protect such data items can cure the scrambling
problem, but may also result in poor performance if done naively.

The assumption that data structures will not be used in a reentrant
way. As with global data, shared data structures must also be
protected from unintentional concurrent use.

The use of non-reentrant, or otherwise unsafe library functions.
There are surprising number of commonly used library functions which
are not thread-safe. Library functions can encapsulate problems
such as a) and b), causing problems for code that uses them.
A simple example of this is the rand() implementation provided
by most C libraries. The rand() function contains internal static
data, and is not thread-safe. When use concurrently by different
threads, it yields unpredictable results ranging from incorrect
behavior to fatal crashing.

Straightforward modifications to existing data structures to allow
reentrancy (i.e. addition of mutex locks etc) typically yield performance
problems due to high lock contention etc. It often takes a slightly novel
approach to keep 90% of the benefits of the original sequential
non-reentrant code when going to a fully reentrant multithreaded
implementation.

The layout of data structures in memory may cause "hot spotting",
thrashing the machine's cache coherency hardware. Hot spotting
occurs when two unrelated data structures occupy nearby memory
addresses, close enough that they fall within the same cache line
(most SMPs), or memory page (some NUMA machines). When the "unrelated"
data structures are accessed concurrently by different CPUs, the cache
lines that contain the data structures thrashes back and forth between
the CPUs as writes occur to the two data structures.

The Challenges of Real-Time Ray Tracing:

The design and implementation of a ray tracing system capable of attaining
real-time rendering rates (15 fps or more) poses several unique challenges to
the implementer. Some of these challenges actually simplify other design
decisions quite a bit.

At a rate of 15 frames per second, there is no time to dynamically load
balance cpus to any great degree. By the time you figure out which
processors are doing too much work, and which ones are underutilized,
the frame is finished rendering. Since we're talking about real-time,
it's safe to assume that the next frame will be different (maybe quite
different) from the last one, so load balancing factors computed for a
previous frame probably aren't too helpful either. For real-time ray
tracing, we might as well throw out any ideas of load balancing, and
instead concentrate on reducing overhead in the code. The best
strategy for load balancing in this case is to come up with a good
static decomposition, balanced against ray coherency and other issues
mentioned previously.

In order to achieve high frame rates, the per-frame overhead in the ray
tracer itself must be minimized. Simple things like allocating image
buffers, allocating and clearing counter arrays for grid data structures,
these all add to the overhead which is incurred on every frame.

At 15 frames per second, there's no time for copying data around much.
The layout and format of the rendered pixels needs to be darn close to
its final form so that there's no need for various packing/unpacking of
strided pixel data from remote CPUs, and the number of messages sent
between CPUs should definitely be minimized.

Unlike a regular non-real-time ray tracer, overhead from simple processes
in the ray tracing algorithm can actually have a direct effect on the
maximum attainable rendering speed. Such processes include pixel value
range clamping and quantization, generation of primary rays according to
camera parameters, allocating or clearing arrays to store counters for
"mailboxes" used by grid acceleration schemes, etc.

Things that would never make the Top-10 list of time consuming pieces of
code in a regular ray tracer may actually show up as serious CPU-time
contenders in a real-time ray tracer when rendering scenes that are
simple enough to be rendered in real-time, or when using enough CPUs
to render more complex scenes at such a speed. A simple way to see this
effect is to render a "blank" screen. How fast can your favorite ray
tracer render an empty scene with no objects and no lights? It is
surprising how slow most ray tracers are before they are tuned for this.
Remember, we get less than 1/15th of a second to do everything.
Obviously, if a ray tracer can't effectively render a "blank" scene at
a very high rate, then there's no way it can render a real scene at
sufficient speed.

Once a ray tracer can render a "blank" scene at a very high rate, then
the next step is to start rendering progressively more complex scenes,
concentrating on finding unnecessary overhead when rendering very simple
scenes. There are often bits of overhead which can be eliminated by
pulling various initialization code outside of the per-frame sections of
code, and moved into a separate centralized initialization section.

[For more information, publications, and John's source for a portable, high
performance ray tracing system supporting MPI and threads, and with a front
end that reads NFF, MGF, and AC3D, see his web site at
http://www.ks.uiuc.edu/~johns/ - EAH]

[Vlastimil is currently at the Department of Computer Science and
Engineering, Faculty of Electrical Engineering, Czech Technical University,
Prague]

In the previous RTNews RTNv12n1 there was an informal discussion about
octrees, especially about ray traversal algorithms for octrees. Since I have
devoted some effort in the past to implementing and testing three octree ray
traversal algorithms, I have decided to try to review these and other octree
traversal algorithms for ray tracing in this article.

There were at least 20 papers written about octrees and related stuff, in
RTNews this topic was also heavily discussed. The last article about octrees
that I am aware of was published more than three years ago. I have decided to
cite these paper chronologically as they were published in order to estimate
the contribution of these papers, when they appeared.

We start with the axis-aligned bounding box of the whole scene, consider this
as a node of the octree at the depth zero. Subdivide it by three planes at
all three axes; eight smaller bounding boxes are created, that are called
octants. Now, recursion takes place for all the eight octants:

1) Find out how many objects intersect the corresponding bounding box of the
octant.

2) If the number of objects is greater than a given threshold and the depth
of the node in the octree is smaller than a maximum depth allowed, consider
this octant as the scene and start from the beginning. Otherwise, mark this
octant a leaf node, then assign it the objects intersecting with its bounding
box (usually simply a linked list or an array of pointers to objects).

_____________

The issue of when to declare the node as a leaf is usually called termination
criteria, with the number of objects and the depth of the node are the
variables usually discussed and used.

So the octree is simple, is not it? So the question is what all the papers
were about. Let us look at the articles.

This is the first paper about octrees and tracing rays that is usually
credited in citations. The traversal algorithm is simple: compute the point Q
that is not currently inside the processed leaf-voxel, but in the neighbor
voxel, and perform point-location for this point Q. The point location search
always starts from the root of octree (for deeper hierarchies this may waste
time). For memory representation Glassner used a hashing scheme over the
octree node's unique IDs. At that time memory saving was an important issue,
so storing empty nodes was avoided. This article is also considered
historically the first one, that really brought significant speedup for ray
tracing algorithm by a practical algorithm comparing with an O(N) one (the
truth is that two octree & ray tracing article were published in 1983 in
Japan by Fujimoto et al and Matsumato et al, but they were written in
Japanese and thus were/are not publicly available). I implemented this
traversal algorithm, results are below.

The paper is a summary of the ray tracing algorithm and it also includes a
part about spatial subdivision methods and ray traversal algorithms. It first
outlines (on page 68) the recursive traversal algorithm for the hierarchy,
that can be used both for BSP trees and octrees. No experimental results were
presented.

This paper is actually about BSP trees, but is included since they can
emulate octrees in a simple way by forcing the BSP construction to finish
only at the depth of a multiple of three. My opinion is that the title of the
paper is rather an exaggeration. I do not have a copy of this paper; if any
of the readers would send me a copy, I would appreciate it.

This paper is actually about uniform grids (called SEADS), but it discusses
the traversal algorithm for octrees, that gave rise to uniform grids. It
follows from the paper that uniform grids were invented because the 3DDA
traversal algorithm for octrees was not as efficient as the authors had
expected, particularly the vertical traversal algorithm for the octree. They
discussed the 3DDA octree traversal in detail and compare it with the uniform
grid 3DDA traversal. For a presented scene the total rendering time of the
ray tracing with octrees was about 3 times longer than the one for uniform
grids, when using the octree traversal algorithm using Glassner [1]. The time
for a traversal step to the next voxel is for the uniform grid 3DDA 13 times
faster than for a octree traversal step using Glassner. The total rendering
time and the time for one traversal step is sometimes misinterpreted by
papers that follow.

This paper is about reduction of empty voxel encoding. It also discussed the
intersection test of a triangle with a voxel. The authors claim that their
traversal algorithm is faster in execution than [1][2], but they do not
support it by any experimental results. In my personal view, the contribution
of this paper is unclear.

In this RTNews issue the author analyses the properties of the recursive
octree traversal algorithm which actually was first published in [2] by
Jansen. He uses BSP trees to explain the algorithm and especially for the
analysis. Even if the concept of the traversal he reinvents is the recursive
algorithm, the analysis is bright and it is interesting brain exercise; you
can try it. The originality and properties of this traversal method are
discussed by Eric Jansen in the subsequent RTNews3 and later in the
RTNv5n1,
also by Kelvin Sung in his Gems III article on BSP tree traversal and in the
RTNv5n2.

This paper introduced a special traversal method algorithm based on tables to
neighbors. Since the results of this algorithm are presented below for SPD
scenes, we briefly describe it: 1) compute the intersection point P of a
given ray with the axis-aligned bounding box of the scene. 2) Move the point
P inside the bounding box by epsilon along the ray path 3) Find out the voxel
V of the octree containing P using point location. 4) Compute the exit point
Q of the ray with this voxel. 5) Move Q by epsilon outside the voxel boundary
along the ray path. 6) Ascend octree hierarchy until Q lies in the node
pointed at. This is performed using several intricate tables for faces,
edges, and vertices. 7) Descend down the octree until a node W of a size
greater or equal to V, using only tables routing. 8) If the reached node W is
not a leaf node of the octree, descend further using point location. 9)
Compute the intersection test of a ray with the objects in the voxel. 10) If
an intersection exist lying between P and Q, computation is over. 11)
Otherwise, go to step 4).

At first view this complicated algorithm is interesting in the sense it
descends using point location if needed and ascends until some upper node is
found. It does not precompute where this ascending will be in the hierarchy
and in this way differs from recursive ray traversal. It also intelligently
uses the information of where the ray exits the voxel to avoid point location
when ascending and descending back to the same depth. The algorithm was also
probably published in Samet book: Application of Spatial Data Structures, but
I was not successful in buying this book (it was sold out) or in borrowing it
from the library. If anyone can lend me this book for some time, I would
appreciate it.

In this paper the author uses a method similar to Fujimoto of a virtual
uniform grid over the octree. Unlike Fujimoto's approach, the traversal is
based on the smallest voxel size and is actually 3DDA uniform grid traversal
algorithm. In each virtual voxel it is checked whether it is a new processed
voxel using a smart hashing scheme, which was for me at the first view rather
hard to understand. The necessary condition of this approach is that
subdivision planes for each interior node have to lie in the middle of the
current octant. He compares the algorithm with octree by Glassner [1],
Arvo/Jansen [7],[2], and uniform grid [4]. The results are not striking, no
significant difference in timings is reported (7-17%), for scene "mount" from
SPD the proposed algorithm is even slower.

This paper claims in the introduction that it is the combination of the
octree and the uniform grid. I have found out after careful reading that it
is actually about BSP trees, but they are traversed as octrees. The authors
do not cite any paper about BSP trees (Kaplan, or MacDonald & Booth, or
Subramanian & Fussell) and do not compare their results experimentally with
another method. The interesting is that the octree traversal algorithm that
was briefly outlined very much resembles the one later discussed in detail
and published by Gargantini [14].

Authors from University of Sussex discuss the octree in context of parallel
execution on more processors. They deal mainly with the dynamic subdivision
of the octree on the fly and showed the statistics of the number of octree
leaf nodes at different octree depths. They also provided their view of
comparison among grids, octrees, and BSP trees to some extent using common
sense, but without any experimental results. The interesting fact is the use
of the HERO traversal algorithm for octrees (Agate, Grimsdale and P.F.
Lister: {The HERO algorithm for ray-tracing octrees}, in Advances in Computer
Graphics Hardware IV, 1991), that does not require the octants to have to be
cubes. I do not have this paper, so I do not know what HERO was about.
Surprisingly, they conclude the paper with the statement that octree are a
more efficient data structure than BSP tree and grids.

[Another related paper not covered here is the SMART algorithm; see the
article that follows. -EAH]

This paper claims in the abstract that presents a hybrid combination of
uniform grid and octree, that is why I include this paper here. The authors
discussed a method, when the node is subdivided into more than 8 voxels,
namely 4^3 or 8^3 (though these should not be called octants any more). They
call their method FINE-ARTS. For traversing between the voxels they used 3DDA
algorithm at the corresponding depth of a node. But stop reading and think
now .........................................................................
This is actually the approach of recursive grid with limited resolution
setting, is not it? The authors compare the results with Fujimoto[4] octree
and uniform grid. The authors do not cite work of Jevans and Wyvill:
{Adaptive voxel subdivision for ray tracing}, GI'89, pp. 164-172, 1989. So
their contribution is rather questionable.

Author presents an approach based on octrees when rendering multiple frames
of a scene with moving CSG primitives. He uses some global CSG trees and
local instances and for presented scenes he saves up to 75% of rendering
time. The paper is about temporal coherence when the viewpoint is fixed.
Author in this paper follows the work of Glassner[6].

In this paper the authors propose the recursive traversal algorithm, that has
special identification of octants that have to be traversed. Authors claim
the improvement of execution speed from 32% to 62% over the Samet traversal
code [8], when the number of voxels is huge. Since we implemented this
traversal algorithm, we outline it briefly. We assume the signed distance to
the entry point and exit point of the octant is known. First we compute the
signed distances with all three subdivision planes. We sort these three
signed distances in an ascending order. Then we disregard the signed
distances outside the interval given by the entry point and the exit point.
Next we identify the octants by the positioning of the intersection points
with subdivision planes with regard to the other planes coordinates. The
traversal is based on the knowledge that in an octree at most four octants
can be pierced. These are induced by four intervals given by three
intersection points inside the current octant plus entry and exit point (five
points along the ray path gives four intervals). The authors also discuss the
robustness of the traversal algorithm when the ray pierces the neighborhood
of intersection of more than one splitting plane, at worst the middle point
of the octree node.

In this paper the authors give a comparative study of traversal techniques
for uniform grids and octrees. The number of traversal methods they implement
and test is eleven, so it is the most extensive study of octree traversal
algorithms presented up to now. We only enumerate these traversal algorithms
(called in the paper ray generators): a) Glassner[1]
b)Peng[5]
c)Samet[8]
d) Part of Samet[8] algorithm devoted to the ascending/descending phase e)
Sandor - (J.Sandor: {Octree Data Structures and Perspective Imagery, C&G,
Vol. 4, No. 4, pp.393-405, 1985 - I do not have this paper). f) Rothe - in
Dissertation of University in Karlsruhe, Germany, 1991, written in German. g)
B. Froehlich and A. Johannsen - paper from 1988, also written in German. h)
Sung[9] i) Fujimoto[4] j) Optimized traversal algorithm of Samet[8] k) two
Endl's methods, which is also another hybrid of Samet[8] and Fujimoto[4],
very similar to Fujimoto's octree traversal.

From the paper follows that for two test scenes the j) algorithm is probably
the fastest. Paper concludes with stating that selection of the fastest
acceleration method is scene dependent (uniform grid versus octree).

This paper is about discrete ray tracing (originally coined by Yagel), or as
it is also known, volume ray tracing, when the objects are represented not by
shape, but by the presence in a huge voxel space (let us say at least
256x256x256). This requires the memory to be large enough, so the authors
attack the problem from the opposite direction - grids->octrees. They use the
octree to reduce this space. They provide a fixed-point two-step traversal
3DDA algorithm, but over the octree, which saves 50% of rendering time in
comparison to Yagel's algorithm. The two-step algorithm is considered as one
octree of resolution having more than 8 voxels, for example 32x32x32,
similarly to [12]. This node, if it is not empty, contains another such an
`octree'. In my opinion, it is actually the concept of the hierarchical
grids, but used for volume visualization.

This paper has completely the same content as [16], but different text
formatting and sentences are changed. I really wonder that it was not
rejected, when previously published by WSCG in February. The paper also does
not cite [16] and I consider this paper publishing policy of authors rather
inconvenient.

[19] I. Gargantini and J.H.G. Redekop: {Evaluation Exact Intersection of an
Octree with Full Rays using only Radix-Sort and meet operations},
Compugraphics'95, pp. 278-284, 1995.

This paper deals with volume visualization as well. It uses the information
about the viewpoint with regards to the bounding box of the whole scene, that
is, 26 (3^3-1) regions. The visibility of octants is then predetermined by
the priority order of octants for a given region, if rays hits octree root
node. This is then used for DDA to accelerate ray shooting, but
unfortunately, no experimental results are reported. Similar approaches can
be found in many visibility papers.

This paper applies a cost function based on surface area heuristics to octree
construction, precisely according to the paper of MacDonald and Booth
({Heuristics for Ray tracing Using Space Subdivision}, Visual Computer, pp.
153-165, Vol 6, No. 6, 1990}. The idea is not to split the octree node to the
eight voxels of the same size, but to position the plane(s) at the minimum of
the cost function estimate. The cost function is computed independently at
all three axes and the planes position is also selected independently. The
paper does not introduce anything that is novel except the name Octree-R. It
reports the results on scene gears, tetra, balls, and rings, the improvement
is by 4-47% concerning the intersection tests. The authors neglect the
rendering times in the comparison (this improvement is 17-37%). They do not
mention which traversal algorithm they used and also they do not compare the
results with the original paper of MacDonald and Booth. However, the results
of this approach are below.

This paper is not about octrees, but it estimates the cost of a spatial
hierarchy for a given scene, when the hierarchy is built. The issue is
whether it is possible to predict the performance of a spatial data structure
before ray tracing is started; this issue is important, especially for huge
scenes. The first half of the paper is about any hierarchy, but the octree is
taken as the example in the article at the second half. It is difficult to
restate the results here in short, it is an interesting paper worth reading,
also containing experimental data.

_____________

I have not included papers which are specifically about BSP trees/k-d trees,
since these are something rather different. The BSP tree can emulate the
octree, but the octree cannot emulate the BSP tree. The cost for traversing
this simulated octree using a BSP traversal algorithm can be higher or lower,
but it very much depends on the implementation. I do not included on this
list the papers which I have not read (mostly several master theses and
technical reports), since they have been unavailable. These probably do not
contain new stuff. I also omitted several papers which use octrees, but in
which the octree is not a topic discussed in the paper.

From of all the papers listed above we can observe these important issues:

These three issues are usually somewhat intertwined. A year ago I decided to
form my own opinion about octrees based on the implementation in the GOLEM
rendering system developed at our department (http://www.cgg.cvut.cz/GOLEM).
I therefore involved three undergraduate students in the fifth year for one
term. After reading the octree papers I selected following methods:

OCTREE84-C) Glassner's classic paper [1], but implemented without a hashing
table but rather using nodes with eight pointers to descendants.

OCTREE89-C) the paper of Samet[8], since the traversal code is intricate and
thus appealing.

OCTREE93-C) the paper of Gargantini[14], since it was the latest one about
the traversal algorithm for octrees, and generally usable for ray tracing.

OCTREE84-A) the approach of Octree-R[20] applied to [1].

The octrees are built by subdividing the center of the box, except
OCTREE84-A). My strong request to students was that the generated octrees had
to be the same for all the source code, so the octree can be fairly compared
based on ray tracing SPD scenes. The termination criteria was the maximum
depth (also called level in some papers) of the node in the octree and the
number of primitives in the node. For testing below I decided to take the
maximum depth as variable 4,5,6, or 7. The node is not further subdivided
when the number of objects is smaller than two.

The students did their job, but later when I studied their source code, I
have found out that the coding itself is rather inefficient. Therefore, I
completely rewrote their source code, which took more than one man-month of
programming, debugging, and testing. I improved the timing about three times
in the best case, and thus I believe the source code are now efficient
enough. I also decided to include Octree-R for the paper of Gargantini
OCTREE93-C), this is marked as:

OCTREE93-A) the approach of Octree-R applied to [14].

Letter C means center subdivision, letter A means adaptive, that is, cost
estimate subdivision. The finding of the position of the splitting planes
with minimum cost was performed exactly according to [20]: for the object and
spatial medians and for N splitting planes equidistantly placed between these
two medians, select the one with the minimum cost estimation (I used N=10). I
would like to note that such a selection scheme does not always find the
minimum cost estimate, since there can be a plane with better cost estimate
(even a plane outside the median interval).

Here, we present only the timings for ray tracing SPD scenes, for maximum
depth allowed 4,5,6, and 7. The additional parameters (number of nodes,
references to objects etc.) can be found at SPD site
(http://www.acm.org/pubs/tog/resources/SPD/overview.html). The parameters
reported are the same as for previous comparison paper about hierarchical
grids, so the comparison between grids and octrees can be performed directly.
Rendering times are in seconds, testing was conducted on Intel Pentium II,
350 MHz, Linux, egcs-1.1.2, optimization switches -O2.

When using subdivision for center and for cost estimate subdivision, there
is local minimum for the maximum depth allowed, let us call it d_min. This
minimum is more apparent for center subdivision, the times for depths
d_min-1 and d_min+1 are relatively higher than for adaptive subdivision.
For some scenes (tree) the local minimum was not achieved for tested
maximum depth. This property was reported by several papers, starting with
Fujimoto[4]. The optimum maximum depth allowed always depends on the number
of objects and their distribution in the scene.

The center subdivision always shows worse performance than cost estimate
subdivision for SPD scenes for the same maximum depth allowed. On the other
hand, the preprocessing time for cost estimate subdivision is usually
higher than for center subdivision.

We cannot predict which of the subdivision methods will create a smaller
number of leaves. Usually, for densely occupied scenes (such as "balls" and
"gears") the cost estimate subdivision creates a smaller number of leaves,
and surprisingly, still outperforms center subdivision. For sparsely
occupied scenes (such as "tree") cost estimate subdivision creates a
significantly higher number of nodes and outperforms center subdivision
considerably. My opinion is that cost estimate subdivision adapts better to
the objects' distribution in the scene.

The best timings do not differ too much for center subdivision and cost
estimate subdivision for some densely populated scenes. But for the same
depth, cost estimate subdivision is the indisputable winner. For sparsely
occupied scenes the cost estimate subdivision is of significantly better
performance.

The three different traversal algorithms show different number of traversal
steps per ray, OCTREE84-C/A has always the largest number of traversal
steps reported. It is interesting that the difference in timings between
traversal algorithms for the same scene and subdivision approach do not
differ too much! The reason for that is the traversal step for OCTREE84-C/A
is of low computational cost, only three comparisons of coordinates per one
step. Surprisingly, for center subdivision the quickest implementation is
Samet OCTREE89/C. We suppose that the traversal tables are probably stored
in the cache, so a traversal using mostly these tables can be also very
quick. This does not correspond with [14] too much; OCTREE93 traversal cost
per one step is rather high. Current hardware eliminates to some extent the
performance dissimilarities for different traversal approaches. I can only
state that I, as the programmer, did my best.

It is interesting to compare hierarchical grids with octrees. The performance
of hierarchical grids and octrees constructed with cost estimate subdivision
are very similar for SPD scenes! It is also valid for number of intersection
tests per ray and number of traversal steps. The number of cells (interior
nodes and leaves) generated by hierarchical grids is remarkably higher. It is
caused by the construction of grids; when creating a grid subdivision, the
number of cells is required to be near the number of objects in the grid
bounding box. We should realize that the octree is only a special case of
hierarchical grids with a specialized traversal algorithm. For SPD scenes
measured and the parameters of statistics we can conclude that the
hierarchies created by hierarchical grids and octrees are very similar and
exhibit similar performances. Hierarchical grids are not so sensitive to the
setting of parameters for grid construction as the setting of the termination
criteria for octrees. It only supports the conclusion that these termination
criteria for octrees, in spite of their intuitive and common sense, are not
the most suitable for octree construction. It should be interesting to
implement the termination criteria using the cost estimate as outlined by the
article:

The article only discusses the possibility of using such termination
criteria, but no exact termination criteria are given and no experimental
results were given in the article, though the cost of the hierarchy is
elaborated well.

It will be also interesting to compare octrees and hierarchical/uniform grid
for huge scenes containing millions of polygons, since for such scenes
hierarchical grids can have a real appeal. (I do not have available such
scenes if you can provide any for this purpose, please, contact me by e-mail).

What about the best efficiency scheme? It is related, but completely another
problem. Now, we would like only to point out an article where the authors
formally proved that the worst time complexity of ray shooting is O(log N),
where N is the number of objects. See, please, L.S. Kalos and G. Marton:
Analysis and construction of worst-case optimal ray shooting algorithms},
C&G, Vol. 22, No. 2, pp. 167-174, 1998. Most of the methods developed by
computer graphics people do not take worst-case complexity into account, but
average case complexity is rather the main issue, even if not explicitly
mentioned in the paper. It still holds that the methods aiming at the
worst-case complexity that were developed by computational geometers exhibit
practically unacceptable space complexity and preprocessing time complexity
for common size scenes. At the present, I can only agree with the opinion
that no acceleration technique has been proven to be prevalent for ray
shooting problem for scenes with different number of objects and their
distribution in the scene.

I saw the "Octree Traversal and the Best Efficiency Scheme" thread in RTv12n2,
as spawned by the question:

"I'm trying to find out about fast methods for traversing octrees. It
strikes me that you could use some kind of integer math method like a 3D
Bresenham algorithm to get a list of cells a ray intersects".

As you point out Erik, my SMART algorithm adopts this approach; (though as I
recall, Agate, Grimsdale and Lister's HERO approach is a variant on SMART
rather than vice versa; I visited them at the University of Sussex in 1989).

Anyway, I thought you might be interested in some code I knocked up back at
Edinburgh University for the lazy evaluation 3-D projections of Quaternion
Julia Sets, ray traced with SMART in integer arithmetic; see the attachment.
This allows for some fast voxel addressing schemes during traversal. (The
code only does orthographic views at the moment, but perspective would be
easy enough ... )

Thanks! Good to hear some more history. My answer to Ben was just a direct
reaction. I did not check on the historical correct order. HERO and SMART
just came to my mind as octree versions of the ARVO algorithm. Thanks for
your code.

BTW, I found the C&G article extremely difficult to read. It took me quite
some time to find out that indeed the SMART algorithm was a top-down
recursive algorithm. It was well hidden in the code. Looking backwards we all
can see how we failed to sell our ideas in a clear and effective way.

____

John Spackman replies:

Ah, that'll be because I found it extremely difficult to write .... ;^)

I just noticed your discussion in the latest Ray Tracing News (RTNv12n1). The
original topic was a Bresenham-like octree traversal algorithm. Do you know
about Spackman and Willis' "SMART" algorithm? (The Smart Navigation of a Ray
Through an Oct-Tree by Spackman and Willis, Computers and Graphics Vol. 15,
No. 2, pp. 185-194, 1991). It's quite elegant in C; there is a C++ version
which is perhaps less simple but more general included in my VFleet volume
renderer (http://www.psc.edu/Packages/VFleet_Home). It's probably not a huge
win for ray tracing, as it comes from the days before cache coherence was the
big issue, but it certainly fits the description of being an integer-based
octree traversal algorithm.

Nicolas Delalondre wrote:
I'm trying ways to speed up my raytracing engine ( It uses currently a box
automatic hierarchy). I've heard about the use of an modified Z-buffer can do
it for primary rays. Is there someone to explain me how to use and implement
such a Z-buffer? Can we use it in addition to bounding boxes?

____

Almost any rendering system can be used to do the eye rays. Two major
variants:

render, but instead of a color per pixel, use an object ID per pixel. The
object ID will be the first intersecting object. After "rendering" the
scene, you go back over the buffer and compute the scene where the eye ray
for each pixel is intersected with only one object, the one whose ID is in
the object ID buffer. This extends to light sources, you can go and compute
the scene from each light source's point of view (assuming you are talking
about a point light) and when you go to compute the ray tree, intersection
rays to the light can be looked up (approximated, really, due to the
sampling grid) instead of a full traversal of the object database.

render, then use the z-buffer by discarding objects during the database
traversals that could not have an intersection with the depth in the
z-buffer. Same idea for the light sources--here it is called a shadow depth
buffer--and the usage is much easier than for the object ID buffer--just
compare the distance to the light with the correct pixel in the shadow
depth buffer, if it is further away then you conclude the point is in
shadow, if it is approximately equal or closer (due to sampling grid
errors) it is not shadowed by some other object.

You can see that a good hybrid solution is to use an object ID buffer for the
eye rays and a shadow depth buffer for the light rays.

[I can comment further on method (a), having used Weghorst & Hooper's
original code for it and implemented the algorithm again a few times over the
years. This method is mostly a fine idea, but getting a perfect match
between the Z-buffer and ray tracer is impossible. If the Z-buffer says
there's a particular object at a pixel and the ray tracer doesn't hit it, no
big deal, you just shoot a full eye ray. But you can get the opposite case:
the Z-buffer says there's some object (or no object), but a full eye ray
traced through the scene actually hits a closer object, while also hitting
the Z-buffer's object further on. This is indetectable (short of shooting the
full ray) and results in an incorrect pixel. Usually not a serious problem,
but something to be aware of. One way to ameliorate the effect is to sample
an area of the Z-buffer, e.g. get the IDs of a 2x2 or 3x3 area of the
Z-buffer and test all these. -EAH]

What techniques are available for raytracing procedural/parametric surfaces?
Tessellate to hell and find the intersection between the ray in question and
the generated polygons?

____

It is usually preferable to stop just before the Stygian Gate, but yes, you
are headed in the right direction. :-)

I believe tessellation is in most cases the appropriate algorithm to choose.
There are some refinements of course: tessellate only to the appropriate level
based upon how large the patch is on the screen (to avoid linear artifacts in
the image) and how strongly it is curved (to avoid missing highlights). You
might also think about doing this tessellation in a lazy fashion (only
tessellate when the first ray penetrates the bounding box of the parametric
surface) and caching (saving and throwing away tessellations in a LRU
fashion). The Toro crew at Stanford had some good ideas on how to make this
work with their memory coherent raytracing research.

It is hard to argue that any numerical ray-patch intersector has any
advantages. If you'd like to try implementing one, the Nishita-Sederberg
Bezier clipping algorithm is one to try (described in Siggraph 90 proceedings
if memory serves) or Toth's 85 Siggraph paper on interval techniques. If you
are clever, I believe you can combine ideas from the two techniques to
develop one which might be superior to either.

The main problem with any technique like this is that the work done by each
algorithm amounts to essentially a patch split per iteration, usually with
several iterations per ray. If a patch is hit many times by rays, you keep
paying for patch splits. Any attempt to cache the split caches is very
similar in spirit to just tessellating anyway, so why not just admit it up
front, and make that work well....

____

Stephen Westin replies:

Or, to put it another way, tessellation on demand with LRU caching is very
similar in spirit to numerical iteration for direct intersection :).

Seriously, finding an optimal tessellation for a parametric surface is a
significant computational task, and for a trimmed surface, it hurts just to
think about.

____

Matt Pharr writes:

Hmm. I don't think that it's that terrible to find a good tessellation rate
for a parametric surface. Can you elaborate on the problems you're thinking
of?

Granted, It's probably the hardest part about handling them in general (vs.
bounding and evaluation, say), but the basic "project the control points onto
the screen and use them to bound parametric rate of change approach" is
pretty inexpensive computationally and works quite well.

As for trimming, and ray traced patches, I've always been a fan of figuring
out if the candidate hit point is actually trimmed out after an intersection
with the non-trimmed patch is found, which is a (comparatively) easy 2d
ray-tracing problem.

____

Stephen Westin replies:

If you try hard to find the optimal tessellation (i.e. the minimum set of
polygons that represent the surface to within a given tolerance), you wind up
with some pretty funky mesh topology.

> Granted, It's probably the hardest part about handling them in general
> (vs. bounding and evaluation, say), but the basic "project the control
> points onto the screen and use them to bound parametric rate of change
> approach" is pretty inexpensive computationally and works quite well.

Isn't that a conservative approach that finds the max tessellation rate
needed and tessellates the whole patch to that level? I can imagine that it
could over-tessellate pretty badly (which matters on large models) and
possibly lead to cracking between patches.

How does that work for secondary bounces, where there is no "screen" to
project onto? Or, to put it better, the mapping to the screen is more complex
and possibly ill-behaved (e.g. discontinuous).

[I agree with Steve here; it's harder to tessellate well for bounce rays, and
over-tessellation for the view is a serious problem, not a weird pathological
case. I remember walking through a scene with a few NURBS spheres on a
workstation which had NURBS support in microcode. At one point I thought the
workstation had hung, but in fact it just happened to be close to one of the
spheres and its image-based tessellation criterion caused it to generate
millions of polygons. -EAH]

> As for trimming, and ray traced patches, I've always been a fan of figuring
> out if the candidate hit point is actually trimmed out after an
> intersection with the non-trimmed patch is found, which is a
> (comparatively) easy 2d ray-tracing problem.

Assuming that relatively little is trimmed off, that's a reasonable approach.
How does that work out with real-world CAD models? It's certainly possible
that, say, 90% of the surface area is trimmed off, which would be a big
efficiency hit.

____

and a last tidbit from Stephen Westin:

As for procedurally-defined surfaces, the only reference that comes to mind
is Kajiya's paper, "New techniques for ray tracing procedurally defined
objects". It was presented at SIGGRAPH 83, and published in Computer Graphics
v.17 #3. Besides laying out a method to calculate robust bounding volumes for
fractal surfaces, he describes a really odd way to ray trace surfaces of
revolution, involving bending space and using curved rays.

I'm writing a ray tracer, and I want to implement gloss and translucency
effects. In principle, this is simple - just use a random distribution to
perturb the reflected rays about the pure specular direction, and same for
the transmitted rays. As soon as I get into the details, though, it's not at
all clear how to do this. I've gone through every book and article I can get
my hands on, and none of them give any details, so I'm hoping someone on here
can help me out.

So to put it simply: what's the best way of finding the perturbed ray
directions? For consistency with the Phong shading model, I presumably need
something that at least approximates a (cos)^n distribution? Also, it needs
to guarantee that no ray ever gets perturbed too far, so that a "reflected"
ray actually comes out the other side of the surface. And finally, to make
sure the rays are uniformly distributed, it would be nice if the full
distribution can be broken up into bins, and I can specify which bin a given
ray should be in.

A good place for starters is the book "Monte Carlo Methods: Volume I:
Basics", by Kalos and Whitlock. Every hour you spend reading that will more
than make up for itself in writing a good distribution ray tracer. Andrew
Glassner's "Principles of Digital Image Synthesis" may also have some good
material.

In general, one wants to have an importance sampling function that maps two
random numbers (u1, u2) (both between 0 and 1), to a direction on the
hemisphere, given the outgoing direction. That importance sampling function
should return both a new direction as well as the probability density
function of choosing that angle (hello Kalos and Whitlock). The pdf roughly
gives the probability of sampling that direction rather than some other
direction.

Now, the Monte Carlo estimate of the integral (which is what you're trying to
compute) is:

f_est = \Sum_0^n f(x_i) / pdf(x_i).

That is, it's a sum over some number $n$ of samples of the function f() at
point x_i, each one weighted by 1. / pdf(x_i). In this case your f() is
equal to the value of the reflection function (e.g. Phong) times the incident
light in direction x_i (trace a ray and recurse to compute that...)

A simple importance sampling function just picks random directions on the
hemisphere. This is a good importance function for a diffuse surface (as you
might imagine), but it's lousy for a very glossy surface (since most of the
directions that it chooses will be in directions where the reflection
function is low).

How to turn (u1,u2) into a direction on the hemisphere? One option is to
ignore (u1,u2) and pick random vectors in the unit square until you find one
that's inside the unit sphere (exercise: show that you get a non-uniform
distribution of directions if you just randomly pick a direction in the unit
square.)

A better option is to map (u1,u2) to a direction. See Pete Shirley's web page
for some techniques (I think these are in some Graphics Gems book):
http://www.cs.utah.edu/~shirley/. Glassner probably discusses these, too. I
just looked at Shirley's web page and was reminded of some good references:
see in particular "Distribution Ray Tracing: Theory and Practice".

If you have a more interesting reflection function (e.g. Phong), you're right
about approximating the cos^n stuff. For Phong, it turns out that you can
sample it exactly.

What you want to do is use u1 to compute an offset from the reflected
direction R (dtheta) and then u2 to compute a rotation around that offset
(phi). These and some vector geometry give you the new direction.

Computing dtheta (and the pdf) is the tricky part. The following function
importance samples the Phong BRDF. I really urge you to save this message,
read K&W and maybe some of Glassner and try to derive the importance sampling
function on your own. Really, you'll be glad you did.

> A better option is to map (u1,u2) to a direction. See Pete Shirley's web
> page for some techniques (I think these are in some Graphics Gems book):

Gems III, pp. 80-83.

> If you have a more interesting reflection function (e.g. Phong), you're
> right about approximating the cos^n stuff. For Phong, it turns out that
> you can sample it exactly.

And also for Eric Lafortune's multi-cosine-lobe representation, which has at
least a chance of physical accuracy. For most glossy surfaces, it's probably
your best bet. A RenderMan shader implementing the function (but not random
sampling of it) is available at
http://www.graphics.cornell.edu/~westin/lafortune/lafortune.html.

____

Peter Eastman replies:

Thanks a lot to both of you for your suggestions! I just glanced over
Shirley's paper on distribution ray tracing, and it definitely looks like
something I should spend some time with. Lafortune's model looks rather more
elaborate than anything I was figuring on trying to do, but who knows - maybe
I'll go crazy and give it a shot. I'll take a look at the other references
you suggested too.

I was thinking more about the problem last night, and came up with another
idea for how to handle it. My idea was:

Find a unit vector in the ideal specular direction.

Add a random vector to it, chosen from a uniform spherical distribution.
I'd choose it using
basically the method you posted - pick random x, y, and z values, and throw
them out if the length is too large. The radius of the distribution (less
than 1) determines how much reflections are blurred.

Renormalize the
resulting vector, and use it as the reflected ray direction.

Does this sound reasonable? It's certainly not "accurate" in any sense of
the word, but it should be fast and look reasonably good.

One other question for Matt. In the code you posted, I noticed that when your
reflected ray direction ends up on the wrong side of the surface, you just
return 0. Doesn't this cause reflections to get dimmer at grazing angles? Or
do you repeatedly call the routine until it gives you a nonzero value? I was
figuring that when this happens, I would just flip the reflection vector to
the other size of the surface, so that the brdf would get squashed, but keep
the same total intensity.

Thanks a lot!

____

Matt Pharr replies:

> Does this sound reasonable? It's certainly not "accurate" in any sense of
> the word, but it should be fast and look reasonably good.

That would definitely work, in the sense of giving you pictures, but I think
it would cause weird artifacts. As I understand the idea, you're effectively
sampling uniformly over a cone around the reflection direction. Because the
value of the effective reflection function doesn't fall off on the edges of
the cone, the reflections might look weird. One possibility is to
probabilistically reject rays, rejecting them more often the farther they get
from the R vector. This is known as a rejection sampling technique.

> One other question for Matt. In the code you posted, I noticed that when
> your reflected ray direction ends up on the wrong side of the surface, you
> just return 0. Doesn't this cause reflections to get dimmer at grazing
> angles?

You would call the function n times, n chosen ahead of time. For any times
that the reflected ray is below the horizon, you just assign zero to that
ray's result and still divide the sum of the samples by n when you're done.
That flip trick, though done in certain widely used renderers, is wrong. An
intuition is that you happen to have chosen to sample the function somewhere
that it's zero. You'd like to avoid doing that, but when you do, that's
relevant information and you should record it. As such, you take your lumps
and record a zero result for the direction that you sampled.

Another way of thinking about it: consider uniform sampling of a constant
function over the hemisphere. You take n samples. Each sample has pdf 1/2pi
(see previous message). Say the function is 2 everywhere. Your estimate of
the function is 1/n * Sum_n (2 / (1./2pi)) = n/n * 4pi = 4pi. Now say that
you're foolish and you pick random directions over the sphere. The pdf is
1/4pi (surface area of unit sphere). The function is 2 for half of the
directions you choose (on average) and 0 for the other half, making an
average value of 1. Your estimate should be 1/n * Sum_n (0+2)*.5/(1/4pi) =
n/n * 1./(1/4pi) = 4pi. If instead you reflected all of those rays to go in
the up direction, you'd incorrectly get 8pi as an answer.

There are a couple of different ways to think about sampling in a
distribution ray tracer. One, which seems to be where you're starting from
(and which is how Cook et al described the process originally), is to find a
way to generate samples that mimic the distribution you're interested in. In
effect, it's necessary to integrate the function and invert the integral to
do this. Sometimes this isn't possible.

Another way of thinking is in terms of importance sampling: you've got some
function that is maybe equal to the function that you're trying to integrate
or maybe just similar to it in some way. You figure out how to generate
samples from that function's distribution. Then, when estimating the final
value of the integral, you divide by a normalization factor that accounts for
the distribution that you actually sampled from. This is a more general way
of going about all this stuff, though it is more tricky and adds another
layer or two to get right. But I'd urge you to try to head in that direction
at some point in the future..

____

Peter Eastman responds:

> That would definitely work, in the sense of giving you pictures, but I
> think it would cause weird artifacts. As I understand the idea, you're
> effectively sampling uniformly over a cone around the reflection direction.
> Because the value of the effective reflection function doesn't fall off on
> the edges of the cone, the reflections might look weird.

Actually, it isn't uniform since the displacement vector can lie anywhere
inside the sphere, not just on the surface. So the probability is maximum at
the center of the cone (where the sphere is thickest), and goes to zero at
the edges.

> You would call the function n times, n chosen ahead of time. For any times
> that the reflected ray is below the horizon, you just assign zero to that
> ray's result and still divide the sum of the samples by n when you're done.

Ah, I understand. So at each surface, you fire off many reflection rays to
sample the distribution? I'm following Cook's method of only firing one ray,
and picking it according to the desired distribution. In this case, flipping
the ray isn't really wrong, it just means that your rays have a different
distribution at glancing angles.

> There are a couple of different ways to think about sampling in a
> distribution ray tracer. One, which seems to be where you're starting from
> (and which is how Cook et al described the process originally), is to find
> a way to generate samples that mimic the distribution you're interested
> in. In effect, it's necessary to integrate the function and invert the
> integral to do this. Sometimes this isn't possible.

Shirley's Distribution Ray Tracing paper tells how to do that for a Phong
distribution, so I can use his equations. Thanks again for telling me about
that paper!

It recently occurred to me that there is no paper or text-book or other source
(that I know of) that describes all the potential uses of importance to reduce
render time for ray tracing. Importance, the adjoint of radiance, has been
used since 1992 to speed up global illumination calculations. It is
therefore surprising that importance is not widely used for ray tracing, where
similar benefits can be gained: the calculations can often be sped up
significantly if it is known a priori that the result does not have to be very
accurate.

1. Uses of importance in ray tracing

[Hall83] describes how one can keep the fraction of contribution with each ray
and use that to prune the ray tree -- if the result of tracing a ray is known
a priori to contribute very little to the final image, there is no need to
trace it. But this fraction (which is the same as importance) can also be
used to speed up other parts of ray tracing:

Reducing the effort to calculate normals (bump maps can be simplified or
skipped, geometric normals can replace interpolated normals).

Simplifying intersection tests if there is a level-of-detail representation
of geometry.

There are probably several more uses than these. In addition, there are also
some uses that are not strictly "traditional" ray tracing:

If the constant "ambient" term is replaced by a better approximation of soft
indirect illumination, then the computational effort spent on estimating
that indirect illumination can be reduced.

Increasing the step length in ray marching for volumes with participating
media.

2. Examples

If a part of a scene is seen through a semi-transparent object that lets 10%
of the light through, the importance of the objects behind it is not small
enough that we can avoid tracing transparency rays, but the illumination
calculations can be simplified drastically using some or all of the
simplifications listed above.

In a scene containing water, glass, ice or similar refractive materials, each
ray intersection will usually spawn both a reflection and a refraction ray.
For multiple layers of these materials, there is an exponential explosion in
the number of rays to be traced -- unless importance is used. The reflection
coefficients depend on the angle of incidence (Fresnel's law), so some rays
will have very little influence on the final image. If we keep an importance
associated with each ray, we can not only avoid tracing many rays, we can also
simplify calculations for many other rays that are somewhat important (not so
unimportant that we can skip them altogether, but unimportant enough that we
don't need to do all the calculations associated with them to full precision.)

3. Why each color band should have separate importance

In general, there should be a separate importance "band" for each color band.
For an RGB color representation, there should be red, green, and blue
importance. To see why, consider the following example: if an eye ray hits a
yellow specular object, the importance of the reflection ray will have 0 blue
component. If the reflection ray then hits a purely blue diffuse object
illuminated by an area light source, we can skip all the illumination rays to
sample that area light source since we know that none of the illumination of
the blue object is going to show up in the rendered image.

If we had only a scalar importance, we would assign the reflection ray
importance 1, and the illumination of the blue object would be important
enough to require tracing illumination rays to the area light source.

4. Importance transport in ray tracing

Importance is transported like light, but emitted from the eye. This means
that eye rays have importance (1, 1, 1) and that at reflections and
refractions, the importance should be multiplied by the reflection/refraction
coefficients to get the importance for the new ray.

Let's look at an example to see how this works in practice. If an eye ray hits
a yellow ideal mirror, the reflection ray will get importance (1, 1, 0). If
that reflection ray hits a 50% transmissive object, the refraction ray will
have importance (0.5, 0.5, 0). If that ray hits a diffuse red surface with
diffuse reflection coefficients (0.6, 0, 0), the illumination ray will have
importance (0.3, 0, 0) -- assuming the light is a point light. For an area
light sampled by n illumination rays, each illumination ray should have an
importance smaller than (0.3, 0, 0), although probably not 1/n smaller.
(Perhaps the square root of 1/n is good, I'm not sure.)

5. Importance in global illumination

Importance (also known as "potential" and "visual importance") is formally
defined as the adjoint of radiance. It has been used in neutron transport
theory since the late 1940s and was introduced to the global illumination
community by Smits et al. in 1992 [Smits92] (they used it to speed up the
calculation of a view-dependent radiosity solution). Since then, importance
has been used to speed up calculations for many variations of both Monte Carlo
and finite element methods. Please refer to the bibliography below for a list
of some of the many papers about importance in global illumination.

6. Why is importance not used more in ray tracing?

Given that importance for ray tracing is conceptually simple, is simple to
program, and gives significant speedups, the obvious question is: why don't
more commercial and shareware ray tracers use it? In my opinion, it's a
crying shame to burn CPU cycles computing results to an unnecessarily high
accuracy.

For ray tracers without a shader interface, all the importance book-keeping
can be done behind the back of the user. For ray tracers with a shader
interface, shader writers need to make their shaders exploit importance. This
will increase their programming effort since each shader should be able to
perform both simplified (approximate) calculation and full (accurate)
calculation. But the pay-off in reduced rendering time is really worth it.
And if some shader writers don't want to use importance (for whatever reason),
they can just ignore it.

I would be interested in a discussion on why importance is not more widely
used in ray tracers. I would also like to hear some "battle stories" from
people implementing importance in ray tracers. The way I see it, only the
Blue Moon ray tracer has a good excuse not to use importance since it has to
adhere to the Renderman specification. But what about all the other commercial
and shareware ray tracers without importance?

[As a data point, the two ray tracers I developed for commercial products had
Hall's importance method built in to them, turned on by default. It saves a
number of nearly useless rays, and we never had complaints. That said, you
usually have to be conservative in doing importance testing for a classical
ray tracer. For example, you generally want to assign an importance to a
particular material overall, so that you either generate all reflection rays
or none of them - generating just a few of them could lead to noticeable
artifacts. -EAH]

References:

[Hall83] Roy Hall, Donald Greenberg. "A testbed for realistic image
synthesis". IEEE Computer Graphics and Applications, 3(8):10-20. November
1983. Describes the use of importance (although it wasn't called that) to
avoid tracing rays whose contribution to the image is insignificant.

[Shirley96] Peter Shirley, Changyaw Wang, Kurt Zimmerman. "Monte Carlo
techniques for direct lighting calculations". ACM Transactions on Graphics,
15(1):1-36. January 1996. Describes how to sample only a subset of a large
number of direct light sources without introducing bias.

Jeffrey Lewins. Importance, the Adjoint Function: the Physical Basis of
Variational and Pertubation Theory in Transport and Diffusion Problems.
Pergamon Press, 1965. Book with a detailed description of the use of
importance in neutron transport theory. Mentions that the term "importance"
was coined by Harry Soodak in 1948.

[From comp.graphics.algorithms; I liked his list, so I thought it worth
reprinting. -EAH]

Zachary Turner wrote:
can anyone recommend some good books on computer graphics theory? Obviously
there's the Foley book, can anyone recommend some other good ones?

First, I would recommend anything by Alan Watt. I find his writing style very
accessible and understandable. His book, "3D Computer Graphics" is more
accessible to a beginner than Foley et al. IMHO. As a computer graphics
programmer, I find the Graphics Gems books indispensable but they are not
ideal tools for learning the theory.