Blog

2.4.3 was supposed to be the last 2.4 minor release but, since then, I discovered they were still some weird bugs left (related to OSL shader code optimizer). Fortunately, they have been fixed in the latest OSL release. Therefore, this release is mostly an upgrade to OIIO 1.5.16 and OSL 1.6.6 with a few shader bug fixes. The code optimizer is not yet bullet proof (look at the workaround implemented in granite.osl for instance) but I can move forward.

Once again, this release is dedicated to the integration of OSL within XRT. I think (but can't really believe) that I am done with it. Not only the new OSL based shading system matches the capabilities of the previous RSL based shading system but also it surpasses them in terms of speed and image quality. See below the mandatory car picture for a proof of it.

Although I have been mostly fixing bugs for this release, I have also implemented something missing from the stock OSL: shader layers can connect to and from individual array members or vector components.

There is also a new kind of camera: the "equiangular" camera. It computes a 360°x180° view of a scene (capturing the whole surrounding sphere) and maps it to a rectangle, hence the name "equiangular": each pixel represents the same solid angle. The resulting picture (also called a lat/long texture because it mimics a Mercator projection) can be used as an environment map. Here is a lat/long example rendered with XRT:

A comparison between environment mapping using this picture and true reflections computed with ray-tracing.

shaders accept unsized arrays as parameters. The actual size is determined at run-time when shaders are instantiated.

message passing between different shader groups (and not only within the layer) is supported. For instance, a surface shader can retreive a value computed by a displacement group or can ask a light whether it supports specular or diffuse emission.

any bump-mapped surface seen through a refractive material will have its shading normal properly oriented

differential quantities are properly propagated through reflection, refraction or shadowing.

This last feature matters a lot to me because it was one of the major compelling reasons to switch to OSL. Let's have an example.

Pretty lame compared to my previous post, isn't it ? Worse, I am so proud of it that it is part of XRTexamples (in the differentials folder). Let's pretend it's for educational purposes only.

Both spheres have nearly the same shader, a metallic surface onto which a screen pattern is applied. The left sphere shader (screen.osl) computes sharp transitions between metal and void while the right sphere shader (screen_aa.osl) smooths them using differential quantities. Because the picture is computed with only one sample per pixel (to make it even uglier), the left sphere aliases badly while the right sphere is smoother. Its shadow is also smoother, like its reflection and its shadow in the mirror. The depth in the ray tree increases but that does not prevent the screen_aa shader to perform antialiasing correctly.

If you're still here, there is a more interesting "lantern" effect in the differentials examples waiting for you to try out.

There are no new features in this release but a lot of bug fixes (see ChangeLog for a complete list). I have been rechecking the RenderMan Companion, Advanced RenderMan, Esssential Renderman Fast, Rendering for Beginners and Texturing & Modeling examples. 676 different scenes is a lot of work but the remaining part is even bigger! There will be certainly a few other point releases before I can consider the job done.

The main feature in this release is the integration of the Open Shading Language. There have been a lot of pondering and a few false starts (hence the 6 months hiatus with the previous post) but here it is. Well, sort of …

Some time ago, while trying to establish a road map for next XRT features, I discussed the pros and cons of OSL here: great features but really tailored to a path tracer (no light shaders, hard-coded lighting models (aka closures)). Losing the flexibility brought by the RenderMan Shading Language was really making me uneasy.

Therefore, I first considered implementing RSL on top of OSL machinery: RSL 1.0 and OSL are quite similar, there were only a few global variables and shade-ops missing from OSL to bring back the needed functionalities. Confidently, I started an implementation: I wrote a parser that compiled RSL into .oso intermediate representation and … I gave up because I realized it was a dead end even if it was progressing nicely.

There are several technical reasons for this:

the latest RSL 2.0 is really different from OSL and would have required a much extensive effort.

most of RSL derivative shade-ops work in the surface parametric space whereas OSL shade-ops work on the surface tangent plane. Some shaders would have required a rewrite to perform properly.

But the main reason is that Pixar is going to drop RSL in favor of OSL (see this discussion on OSL Google group or this announcement from Pixar). Of course, RSL is not going to disappear tomorrow but will surely get deprecated in the future.

Therefore, I have settled for an easier path: tailor OSL. XRT shaders follow strictly OSL syntax but there are significant differences:

light shaders are back. A specific statement emit is used to define how light is emitted. Only light shaders write to global variables Cl (light color) and L (light direction).

closures are gone. Surface and volume shaders write to global variables C (surface color) and opacity (surface opacity) and gather light with a lights loop statement which has access to all lights in the scene. Only a lights loop can read current light Cl and L.

a number of ray-tracing shade-ops have been added to trace rays, compute shadows, occlusion and indirect illumination.

Although much more work is needed to finalize this porting effort, there are already a few benefits to notice:

because the LLVM infrastructure is bundled into XRT distribution, there is no need anymore to install Microsoft Visual C++ compiler (which was a rather heavy requirement for people eager to try out XRT).

XRT is faster. The speedup is not huge but noticeable.

All XRTexamples have been checked against this release and run OK (although some renders differ a bit more than expected from their RenderMan based counterpart) and a number of shaders have been ported to OSL. Over the next months, I intend to re-render, recheck (and extend) the whole gallery.

Notes

1. Actually, this flavor of OSL is quite similar to the original Gelato shading language with the added benefit that a number of Gelato shaders compiles right out of the box.

This release is a hot fix for XRT 2.3.0. Sorry about that but I left a few nasty bugs in subdivision surfaces rendering code. As a bonus, XRT now computes tighter bounding boxes for cubic curves which decreases rendering times by 10% on "hairy" scenes.

I can't get no satisfaction

I have never been satisfied with XRT implementation of subdivision surfaces. It was missing important features and was very slow. This new version solves both issues.

Still the same

The general outline of the rendering algorithm (described in this post) has not changed much in this new implementation: a topological structure is built from the subdivision surface description and is refined to isolate extraordinary features and to obtain a control mesh made of quads only. Then, the mesh is split into individual faces and their 1-neighborhood (the minimal data needed to apply subdivision rules and compute the limit surface). During the intersection phase, the face is subdivided on the fly until the resulting patches look flat or small enough. They can then be safely approximated as a bilinear patch which is checked for intersection.

With a little help from my friends

Until Pixar came with the OpenSubdiv project, a developer was really on its own with subdivision surfaces. Quoting the project overview, "OpenSubdiv is a set of open source libraries that implement high performance subdivision surface (subdiv) evaluation on massively parallel CPU and GPU architectures. This codepath is optimized for drawing deforming subdivs with static topology at interactive framerates. The resulting limit surface matches Pixar's Renderman to numerical precision." At first sight, this looks only suitable for people doing real-time graphics or interactive editors but, actually, OpenSubdiv architecture makes it reusable for many purposes.

OpenSubdiv is built from three layers:

hbr (hierarchical boundary rep) is a topological structure designed to store edges, faces, and vertices of a subdivision surface. It also stores attributes of the surface such as corners and creases, facevarying data and various hints affecting the subdivision process such as hierarchical edits. Actually, it supports almost all features of SubdivisionMesh and HierarchicalSubdivisionMesh as defined by the RenderMan Interface specification.

far (feature-adaptive rep) uses hbr to create and cache fast run time data structures for table driven subdivision of vertices and cubic patches for limit surface evaluation. Feature-adaptive refinement logic is used to adaptively refine coarse topology near features like extraordinary vertices and creases in order to make the topology amenable to cubic patch evaluation. It supports these subdivision schemes:

Catmull-Clark

Loop

Bilinear

osd (Open Subdiv) contains client-level code that uses far to create concrete instances of meshes. These meshes use precomputed tables from hbr to perform table-driven subdivision steps with a variety of massively parallel computational backend technologies. Osd supports both uniform subdivision and adaptive refinement with cubic patches. With uniform subdivision the computational backend code performs Catmull-Clark splitting and averaging on each face. With adaptive subdivision the Catmull/Clark steps are used to compute the control vertices of cubic patches, then the cubic patches are tessellated on with GLSL or DirectX. This top-level layer is really dedicated to real-time performance.

As you probably already guessed, XRT now uses hbr to store all subdivision surface data and far to selectively refine the control mesh, solving the missing features problem.

The need for speed

As usual in algorithms, there is a trade-off between speed and memory. The rendering algorithm XRT is using has been designed for GPUs which are so fast for simple and parallel computations that they can afford not to cache anything in memory and still be very efficient. Of course, it turns out to be not that suitable for CPUs.

Therefore, I have tried to use hbr to cache control mesh refinement. It failed because hbr structures are not multithreaded and because manipulating lots of pointers in a topological structure is slow (even slower that refining again and again with a linearized data structure).

The key observation is that, except around extraordinary features, the limit surface of a Catmull-Clark subdivision surface is a b-spline surface. In other words, once a subdivision surface has been refined up to a point that a patch is regular (ie does not contain extraordinary features: points with valence other than 4, crease or corner), that patch can be replaced by a bicubic b-spline patch whose 16 control points are the control points of the patch. Furthermore, when subdividing a patch containing one extraordinary feature (the case where it contains more than one has been dealt with in the refinement phase), only one (sometimes two) of the four resulting patches is extraordinary, the three others being regular. Actually, once features have been isolated, at each refinement level, the number of extraordinary patches stay almost the same. They just get smaller.

The consequence is that subdivision on the fly is almost never needed and that raytracing a Catmull-Clark subdivision surface can be nearly as fast as raytracing a mesh of bicubic patches. Implementing this optimization brought a 10 times speed increase to this algorithm.

Here is the result. First, the raw subdivision surface.

Sloppy Sam

raw subdivision surface

On the next picture, facevarying data (s,t) coordinates have been correctly interpolated across the surface to bring even more worries to this poor Sloppy Sam.

Of course, the major feature in this release is blobby rendering. The subject has already been discussed at great length in the two previous posts and I will not say more, apart from the fact that blobbies have now a dedicated page into the gallery. But there are some other interesting features worth mentioning.

Primitive variables

In the RenderMan terminology, a "primitive variable" ("primvar" for short) is a mechanism that allows you to attach arbitrary data (variables) to objects (primitives). These values are interpolated or not (depending on their interpolation type) and passed from the object to its shader at render time. By overwriting shader parameters, they modify the way a geometric primitive is shaded. A single shader instance may then be reused for a myriad of objects. I finally implemented support for them in all XRT primitives (except for subdivision surfaces).

Some finishing touches to the RenderMan client

Apart from blobby support and a few bugs fixes, binary output is now implemented.

More examples

Blobby rendering ribs and some Python scripts using the new Python binding for RenderMan API are now available in the examples archive.

Actually, there is much more to blobs than adding mere spheres. XRT implementation supports various operations and field functions.

Operations

A max operation is roughly equivalent to a union operation in CSG whereas the min operation is like an intersection. Max is used most of the time to prevent blending between shapes. In the following pictures, a hand is modelled from several blobby ellipsoids, first with no blending at all, then with all shapes blended together, then with blending selectively disabled between fingers. For each of the four fingers, the blobs describing it are added together, along with the adjacent blobs at the edge of the palm. A separate added-together group is made of all the palm blobs. The whole field function is just the max of these five overlapping blending groups.

Another useful operation is sub, which substracts one field from another. As illustrated in the following picture, depending on the strength of the field, we can either put a dent in it (left topmost shape) or dig a hole through it (right topmost shape). The bottom row is the max of two fields to help visualizing them.

There are three other supported operations: mul (multiply fields), neg (negate a field) and div (divide two fields) but I have not yet found any meaningful usage of these. Therefore, I'll skip them.

Field functions

XRT implements field functions as plugins. Let me present you some of them.

Capsule

A capsule field function is the distance field built from a line segment (ideally suited for sausage or spaghetti like shapes). Note how the shapes smoothly blend when their respective line segments are abutting (whether they share the same orientation or not). Actually, if you subdivide a capsule into a sum of shorter capsules, the resulting field is exactly the same as the single capsule.

Here are more spaghettis (the sum of 480 capsules in a toroidal spiral)

Cube

A cube field function defines a cube centered at the origin. Pretty lame, isn't it ? Things get more interesting when the cube is combined with other shapes.

Plane

A plane field fuction defines a unit square in the xy-plane, at z=0, centered at the origin.

Definition

Blobby objects belong to the family of implicit surfaces. That is, instead of being specified explicitely with an array of points like a polygon mesh or a subdivision surface, a blobby surface is defined by a field function $f(x,y,z)$ that is equal to a threshold value at every point on the surface.

A well-known implicit surface is the sphere with center at the origin and radius r, which is $x^2+y^2+z^2=r^2$. You can find many more examples of implicit surfaces here. XRT algorithm for rendering generic implicit surfaces has been detailed in a previous post. Most of the implicit surfaces have an infinite domain (the range of values that $f(x,y,z)$ can take) and combining them does not lead to anything interesting (for example, if we add two sphere implicit surfaces with two different centers, the resulting surface is a void because, at every point in space, the combined implicit function value will likely be superior to the threshold value)

We can get much more interesting effects when using bump functions. A bump function is smooth (its derivatives are continuous at all orders) and has compact support (in broad terms, the part of the (x,y,z) space where $f(x,y,z)$ is not 0 is a box).

In the above example, two such fields are added and their centers are made closer and closer until they seem to swallow each other. What happens? Things get much clearer when we draw the iso-potentials (Image credits).

Disjoint supports

Overlapping supports

When supports overlap, potentials sum, hence the resulting shape.

A blobby object is a combination of elementary bump functions (which define primitive fields like spheres, segments, boxes … ) through a hierarchy of operations on them (add, sub, div, mul, neg, min, max) resulting in all kinds of rounded objects.

Intersection algorithm

As usual, I have looked at what my glorious predecessors have done but I have not found much. Regarding open source implementations, the most comprehensive is Aqsis's one: blobbies are tesselated into polygon meshes using J. Bloomenthal's implicit surface polygonizer [1]. Because I am a bit prejudiced against tesselation (mostly because there is always a trade-off between accuracy and memory space), I prefer direct algorithms. Unfortunately, the litterature on that subject is quite scarce. Worse, every algorithm seems have restrictions either on the shapes or on the supported operations (usually blending), like for instance [3] (that certainly does not imply I think this is a bad work).

Therefore, a fresh look at the problem was needed. Here is my small contribution.

Because a blobby is an implicit surface, the interval-based bisection algorithm used to raytrace implicit surfaces [2] applies.

A fundamental theorem of interval arithmetic states that, for any function f defined by an arithmetical expression, the corresponding interval evaluation function F is an inclusion function of f. In other words, if you define X, Y, Z as three intervals containing respectively the bounding values for x, y, z and evaluate $f(X, Y, Z)$ using interval arithmetic, the resulting interval is guaranteed to contain all possible values of $f(x, y, z)$ for any combination of (x, y, z) within the bounding box defined by (X, Y, Z). Using bounding boxes defined by ray segments and a recursive bisection algorithm (a binary search), it is possible to compute an intersection:

if a interval evaluation does not contain 0, the input ray segment is discarded.

if it does, the input ray segment is subdivided in two halves which are in turn evaluated.

When an interval evaluation contains 0 and the desired precision is reached, an intersection is returned. If no interval evaluation contains 0, there is no intersection.

However, because interval arithmetic is expensive and because a blobby is built from hundreds or thousands field primitives combined in a tree of operations, the repeated evaluation of the whole field function over an interval is awfully expensive. Fortunately, two key optimisations come to the rescue.

First, for most primitive field functions, it is possible to analytically compute the resulting interval on a given segment in 3D space. This way, the result is much tighter than the direct use of interval arithmetic.

The major optimization stems from the fact that primitive field functions have a compact support. This means that any given ray is likely to hit only a very limited subset of all blobby primitives. Before starting bisection, the ray is tested against the blobby bounding box. This gives a confidence interval where the ray can potentially hit the blobby. All primitives (and then all nodes in the tree) are evaluated against this interval. All leaves and nodes which are always 0 on the interval can be safely discarded which leads to a drastic simplification of the tree. Because of the inclusion property of arithmetic interval, we are guaranteed to compute accurate values on any subdivision of the confidence interval using this tree. This trimmed down (and much faster to evaluate) tree is then used in the bisection. The expensive tree needs to be evaluated only once per ray.

That's all for now. I'll give more details on the available field functions in the next post.

In a previous post, I complained about Pixar not updating the RenderMan interface specification1. Since then, my wishes have been more than fullfilled because the complete documentation for Pixar products (including past versions) is now online. Just check http://renderman.pixar.com/view/resources. Registration in the forums is required but is free.

Skimming through it, I can measure how far behind is XRT in terms of features. This new release (available in the Downloads section) is an attempt to somewhat fill the gap.

What's new

At first, my intent was to simply add a Python binding for the RenderMan interface provided by XRT. Using the ribclients project from Nicholas Yue, I thought it could be a quick development. Alas, I stumbled rapidly on limits and bugs of my current implementation. So, it turned out to be a complete rewrite of everything related to RenderMan in XRT.

RIB generator

I have switched from a bison/flex based implementation to a hand tuned lexer/parser. No only this is more flexible and allows for better error checking but loading files is now 30% faster.

RIB client

The previous implementation features only an interface to XRT. The new one supports also saving to output to ascii files or strings. Binary output is in the works.

Python binding

Pixar's Python binding for RenderMan ("PRMan for Python") documentation is rather sketchy. Nevertheless, that was enough to understand that I needed to perform extensive modifications to the original "ribclients" code. I have mostly based my tests on a handful of example files gathered on the Internet. My main sources have been Jon Macey's courseware and Yuichirou Yokomakura blog2.

In most cases, writing a Python binding is just a matter of translating arguments from Python to the target API. There are even tools that automate this kind of tasks, SWIG being probably the most widely known. PRMan for Python follows this pattern except for Procedurals which request procedural callbacks written in Python. My tests are OK but I have not tried enough examples to be completely confident on my implementation.

Misc bits and fixes

There is now a progress bar in the console so that you know if your render is doing well (or not …).

CSG operations and transformation stack management are much more robust.

There is a improved sampler for even better image quality (as explained here).

1. OK, with a bit of googling, you could find versions of RenderMan Pro Server documentation online but they were always outdated.

2. Look Ma, I'm famous !! I don't understand Japanese and automatic translators do a very poor job with this language, so I am unsure of what this guy is doing. Apparently, he tests various renderers including … XRT. He should really give a try to this version!

The Pixar folks are not only good at making movies, they have also a team of top-notch engineers who write R&D papers available at the Pixar online library . One of their latest publications is called "Correlated Multi-Jittered Sampling". The happy few who read this blog know that one of my pet subjects is sampling. Therefore, this paper could not pass unnoticed.

Let me remind you of a few concepts about samplers before digging into it1.

A stratified sampler divides the unit square (the area of a pixel, for instance) into equal area cells and position a single sample within each cell. If the sample is at the center of cell, you get an uniform sampler, the simplest but the worst (in terms of aliasing) sampler. If the position is jittered within the cell, you get a stratified jittered sampler. Jittering allows you to trade aliasing for noise but the noise increases when samples get too near from each other in neighbouring cells or adjacent pixels. This is called clumpiness. A well designed sampler will try to preserve stratification and to decrease clumpiness.

There are other types of samplers. Amongst then, low-discrepancy samplers built from quasi-Monte Carlo sequences are the best at decreasing clumpiness but they tend to suffer from correlation artifacts because they are highly deterministic. They also require a degree in mathematics from you. As explained in this post, XRT use these sampling techniques for area lights, soft shadows, glossiness or translucency.

Pixar's paper advocates a variant of stratified sampling that reaches the level of quality of low-discrepancy samplers through smart jittering techniques. Although the article gives ample arguments to proof the validity of correlated multi-jittered sampling, perhaps the best one is the fact that Pixar is confident enough to use it in the latest RenderMan Pro Server. The author has also been kind enough to provide a sample implementation2. I could not refrain myself to give it a try …

Here is a comparison between XRT current sampler (on the left) and the new one (on the right).

Depth of field (magnified 5 times)

Thin lines (magnified 2 times)

Motion blur (magnified 5 times)

This looks really convincing with depth of field effects and thin lines but is only marginally better for motion blur and large edges. To be honest, visually speaking, I see no differences but, when comparing the two pictures as JPEG files, the picture computed with the new sampler compresses slightly better which means there is less noise.

Therefore, correlated multi-jittered sampling will be the new default sampling method in the next version of XRT.

This release integrates a custom version of the Embree 1.1 BVH building and traversal engine, heavily modified to support all kinds of primitives and not only triangles. Compared to version 1.0, Embree 1.1 requires much less memory to build a BVH tree and therefore is able to process larger scenes at the expense of increased building times. Of course, XRT benefits from this improvement.

Another benefit of the BVH accelerator overhaul is a small speed boost. In my first attempt to integrate Embree, I reported somewhat surprising results: the algorithms supposed to build the most efficient traversal trees had the poorest traversal speeds. I had no clues about this behaviour and I still haven't. Although I have copy/pasted the same kind of modifications in 1.1 that I did in 1.0, this time, I seem to get it right: you can expect a 10% speed increase from SAH-based algorithms.

Just a quick note to let you know that a slew of modications that I made to compile and render PVR on Windows have been accepted by Marcus Wrenninge and are now available to everyone at the PVR GitHub repository. The vast majority of the code changes have been made to please the VC++ 2010 compiler and to export DLL symbols. I have also fixed a number of Python scripts that were not in sync with PVR Python binding.

I have also contributed a VC++ 2010 solution for PVR. It probably would have been better to add WIN32 support to the existing SCons project or to provide files for CMake but I am not an expert for these tools. If someone wishes to step in, I'll be happy to provide help for testing. In the meantime, this will save mucho typing for Windows users. However, there is still quite a lot of work to perform to have a working platform. If this is your first time, expect to spend a whole day downloading, compiling and organizing the whole shebang.

First, you need to compile Field3D, OpenImageIO dependencies (which also have a large number of dependencies). You also need to install boost-1.44 (other versions will probably do; the solution is just set for this very version).

Next, the solution expects to find libs and includes in a well behaved environment, ie your dependencies file structure should look like this

which is probably the ugliest graphics ever published on this blog but you get the *cough* picture *cough*. “Third party tools” root directory can be any name of your liking but, before starting the solution itself, you need to set a THIRD_PARTY_TOOLS_HOME environment variable with this name for the solution to find PVR dependencies. A .bat file is supplied to perform these duties. Others dependencies are hdf5 and ilmbase-1.0.1 (from the OpenEXR distribution) that are subdependencies from Field3D and OpenImageIO.

Finally, you will need to copy libpvr/export to libpvr/pvr and libpvr/external/GPD-pvr/export to libpvr/external/GPD-pvr/GPD-pvr.

Open Shading Language integration

Given that XRT already supports a shading language, why move to something else? The answer is that the current implementation has a number of limitations:

to compute derivatives, a shader is executed at three different locations: P+dPdx, then P+dPdy and finally P (this is how BMRT implemented derivatives). Through some clever tricks implemented within the SL library, the computed values are retained and derivatives are computed at the P location. There are two drawbacks: even if only one derivative is needed, the entire shader is run thrice (except for light loops which are evaluated once). If a derivative depends on an other derivative, the computed value is 0 because second orders derivatives would require more shader executions.

the compiler is not able to take advantage of shader parameters that are constant for a whole primitive and prune execution accordingly. For instance, in the following construct, which occurs more than often, the condition keeps on being evaluated.

if(texturename != ""){
....
}

ray differentials are not propagated which makes it impossible to filter textures or to refine subdivision surfaces.

OSL brings answers to all these issues through automatic differentiation, derivative tracking and just-in-time compilation (based on LLVM tools). These are quite powerful incentives for integrating this terrific piece of software. However, there are also some design decisions or unimplemented features that make me uneasy:

lights are modeled as emissive surfaces which means only area lights are available.

closures (a rough equivalent to RSL lights loops) are built in the renderer although a future version of OSL may support their implementation in the OSL itself1.

As a result, the renderers that integrate OSL must provide implementations for point lights, directional lights and closures. This clearly loses flexibility compared to a full-fledged shading language.

I guess that the main reason for this is that there are deadlines in the real world and that you sometimes have to deliver intermediate releases before the final product to allow people to work. Nevertheless, it does not compromise a major design goal from OSL which is to relieve shader writers as much as possible from the intrincacies of rendering2.

I have not been able to find much information on the latest evolutions of RSL but, from what I gathered here, Pixar folks have chosen to preserve flexibility but require more from the shader writers. Hence, the need for a more complex interface with the renderer. For instance, PRMan 17.0 is now also a path tracer. I have a quite limited knowledge of path tracing but I understand that "importance sampling" (in plain words, put your samples where your light is) is key to performance and image quality. Therefore, an "efficient" shader has to provide information on how it must be "importance sampled".

I do not feel like going backwards in terms of features. So, I could either try to extend OSL for my needs - but can it be called OSL anymore? - or keep on supporting RSL while reusing the OSL infrastructure - how about a public spec, Pixar? In any case, I'll try first to match the existing XRT capabilities (while learning more on LLVM) before moving on to advanced stuff such like global illumination.

True displacements with shaders

Actually, this is a completely uncharted territory for me. There are known solutions for scanline renderers but, when it comes to raytracing, computer graphics litterature does not help much. Despite being a important feature in a renderer (guess why commercial renderers are so tight lipped on that matter), there is only a handful of papers that deals with the subject

The lack of any decent implementation in open-source renderers is probably a good measure of its difficulty:

LuxRender implements a very limited subset of displacement shaders: textures are used to displace subdivision surfaces along the normal. Although a very respectable effort, it's far from I want to achieve.

I believe there is a similar feature in Blender Cycles but, according to its lead developper, Brecht Van Lommel, it's far from satisfactory.

I feel too very attracted by the "Rayes" algorithm, maybe because the paper seems the easiest to understand. The OpenSubdiv project from Pixar will likely be an asset for XRT.

Path tracing

Although I sometimes wonder how people are willing to accept many hours render times for a single (noisy) picture, I want to give it a try just for the sake of curiosity. The concepts are quite difficult to master but the good point is that it is a very hot topic for computer graphics. There are so many papers that the jury is still out to define what is an effective solution. Look for instance at the myriad of algorithms implemented in the Mitsuba renderer.

To help the newbies, there are lots of tiny projects that implements global illumination algorithms:

More RenderMan compliance

Two years and a half after the 1.0 release, 2.0 is finally out. Although I have not implemented each and every feature detailed in the 2.0 roadmap, the 2.0 milestone is 95% complete. This project is above all a learning experience for me and, during these 30 months (almost an era in the world of computer graphics), I have learned that some of XRT original design ideas are now obsolete and must be reviewed. Because I feel more efficient to implement the missing 5% on a stronger code base, it is high time to move to greener pastures. I'll detail the 3.0 roadmap in a future post.

In the mean time, 2.0 is here for you in the Downloads section. Apart from a few bugs fixes detailed in the ChangeLog, this release has the same feature set than version 1.5.

Eye-candy

The image of the day is a 2 million particle system generated with a Python RenderMan procedural from a "strange attractor" equation. The particle hues are defined according to the motion speed on the attractor curve (the faster, the warmer). This example (now bundled with XRTexamples archive) is derived from work done by a student from Pr Malcolm Kesson at Savannah College of Art and Design.

Geek section

For the mathematically-inclined and just for the pleasure to write a few LaTeX formulas, the attractor is a Polynomial A whose equation is:

This post is a sort of follow-up to the previous one. Although "Production Volume Rendering" only deals with voxel buffers, its reading has been inspirational enough for me to improve on XRT volume rendering. For now, XRT capabilities are based on RenderMan 3.2 atmosphere shaders1.

How atmosphere shaders work

Atmosphere shaders are bound to surfaces, just like surface or displacement shaders. So you must have an object in the background for the atmosphere shader to run. In other words, pixels with no geometry at all "behind" them will not run any atmosphere shaders.

The general idea behind the smoke effects is to ray march along the incident ray I, sampling illumination and accounting for atmospheric extinction. Typically, this is done with the following algorithm:

Volume shaders of this type can be very expensive. The computational expense is proportional to the number of iterations of the while loop, which is determined by the step size and the length of I. Therefore, it is important to choose your stepsize carefully — too large a stepsize will result in banding and quantization artifacts, while too small a stepsize results in very long render times.

A smarter shader

Fog light

This example, borrowed from the Gelato example set, features spinning gears in fog lighted with a spot light. Because there are holes and dents in the gears, parts of the fog are either obscured or lighted. This is the famous "god rays" effect.

On this kind of scene, the smoke shader example that comes with the Application Note #20 performs badly. The scene is quite large and there are many small details that requires a small step size to be properly caught. However, you will get a huge speed boost if you realize that the space outside the spot shape is not lit and does not need to be raymarched. If the shader is passed information regarding the spot shape (a cone here) and orientation, it can compute much tighter bounds for the raymarching algorithm and avoid useless steps in the dark void. First, the volume ray origin and position are transformed into the spot canonical space, then the new volume ray is intersected against the canonical cone shape (a mere second degree equation to solve).

You will get a better grasp of the "god rays" effect in the following animation:

You will find also this example and the companion video in the Gelato gallery.

Note

1. Pixar has recently introduced drastic changes in PRMan volume rendering capabilities which I intend to support in future XRT versions.

Finding your way through the computer graphics litterature jungle is hard: for most of the subjects, you will find a plethora of papers, all claiming to bring forward a decisive breakthrough. Most of the time, you will find that the brand new technique does not fit into existing rendering architectures, gobbles gigs of memory or that it adresses only a subset of your problem. Building a consistent rendering system is a difficult task.

"Production Volumetric Rendering: design and implementation" written by Magnus Wrenninge, a technical director at Sony Pictures Imageworks, is an attempt to clear the mess for a specific domain: volumetric rendering techniques. It does not try to describe all rendering techniques available in the litterature, instead, it focuses only on techniques used in the visual effects industry by people that need to deliver on budget and time. Following "Physically Based Rendering" path, it provides a complete rendering system (PVR) and centers the book around the source code.

The architecture Wrenninge advocates is deceptively simple: modeling tools fill voxel buffers raymarched by renderers. Of course, in the book, you will get a lot more details but, in the end, it does not get any further.

The book is divided in three main parts: basics, volume modeling, volume rendering. It may seem strange to discuss modeling in a rendering book but, while creating a polygon soup is quite obvious, building convincing volumes is not for the casual user. After all, one must fill these damn voxel buffers!

I have mixed feelings about this book. In all three parts, the technical content is really excellent and interesting (I particularly liked the chapters on the theoritical foundations of raymarching or on phase functions). The design choices are well explained and the example images (all computed with PVR) are a good proof of their validity.

However, the comments on the companion source code are frequently over extended when they address topics that deal more with programming than rendering. Even worse, they are sometimes redundant (for instance, the various discussions on attributes)1. The modeling part, even for my naïve eyes, is over simplified: there is no mention of fluid dynamics for example.

Finally, this book lacks a conclusion chapter: clues for the reader to extend and improve the system, hints to create animations, something to give the reader the compelling need to go beyond.

To summarize, the book could have been better but you will really get valuable information from it.

This example from the Advanced RenderMan book has long been a problematic render for XRT. I am happy to say that I finally addressed the remaining issues. This scene is now included into the XRT example scenes archive. There are only a few primitives but the shaders are quite complex and challenging. Aside from the flashy lensflare effect, look at the subtle blue atmosphere surrounding the planet.

As it was the last remaining item from the book examples, it was time to put a fresh coat of paint on the Advanced RenderMan gallery. The layout has been improved and the number of pictures has been greatly expanded. For good measure, I have even tried to recreate some of the book pictures for which the RIB files were not provided. Hope you will enjoy them!