Tuesday, September 16, 2008

I've been slow to write the last couple of parts on my untangling series as I've been pretty busy lately. Among other things I've finally got around to submitting a couple of graphics related papers (that I actually wrote ages ago) for publication. Not sure they'll get accepted but there's no harm trying. I'll probably write blog posts about these in the near future now that I have permission from my employer to talk about them.

Note: Neither of these are anything I actually use at work. Well, I've used the adjoint one a tiny bit. The latter is in the spirit of computational geometry papers: it demonstrates an algorithm that runs (asymptotically) in the fastest possible time, but in practice you'll probably want to use an approximation that looks just as good. I was amazed I was able to find a practical application for a result that came out of toric varieties, but it's going to be tough on the reviewers as there seem to be so few people in the graphics world who know about multidimensional DSP techniques.

Of course, straight after submitting I noticed a bunch of pieces missing from some expressions. In places designed to help disambiguate things for the reviewers as well. Ho hum.

I also recently presented a very short presentation at the "Stupid Rat Tricks" part of the Renderman User Group: quine.sl: a self-shading shader. (For more on RSL see Wikipedia.) The source is available here. I don't think the company lawyers fully appreciated that this shader now allows their handiwork to be used like a texture map.

My presentation was completely blown away by the guy who implemented a large part of the Renderman interface completely in bourne shell. I can't think of a stupider thing than that! :-)

I actually wrote that paper a few years ago, before I'd started playing with languages with immutable values (like Haskell), so it reflects my C++ background. My procedural programming colleagues didn't blink at it because it corresponds to the notation in almost all of the languages they use (from Matlab and Mathematica to Fortran and C++). I suspect the reviewers won't blink at it either.

But fundamentally, I do agree with your criticism and if I get a chance I may change my notation.

Wow, the quine is very cool. I once did one that did the cheating approach of sticking the source in an image (http://bobcopeland.com/tile.html) but that was in perl where quines are really easy to write. Table of all the ascii values is a nice approach.

In the case treated in the above manuscript f is linear, so we could write

f x = F x

where F is some matrix, so

J f x = F

(J f x)^T = F^T

(J f x)^T v = F^T v = f^* v

where f^* is the adjoint of f.

In fact the transformation described in the above manuscript is precisely reverse-mode AD, in this case. (Note that J f x is constant in x, so some simplifications occur. These are precisely the optimizations a good compiler would make in eliminating dead code or in simplifying a+0*b to a.)

You're only computing derivatives in the sense that any linear map is the derivative of some function. But actually, I have considered this. At one point I toyed with the idea of representing linear functions as quadratic functions in this way: you get a single representation (as sparse as your implementation of the quadratic) from which I think a compiler can extract both the linear function and its adjoint. I think that's right.

> You're only computing derivatives in> the sense that any linear map is the> derivative of some function.

No, that's really not what I'm doing.

AD, at least in the form appropriate for consideration here, does not compute the derivative as a number. Rather it computes a linear function, namely a product with the Jacobian, that product being (J f x)*v for forward AD, and (J f x)^T*v for reverse AD, where J f x is the Jacobian matrix of f at x.

Here is another way to look at it. The Jacobian J f x is a matrix of numbers. But we can abstract away the representation of this matrix, instead considering it merely as a linear map. (In differential geometry this is called the push-forward.) That is what is done in AD. If we denote the forward-mode AD transform operator as Jf, then we have a numerical equality, Jf f x v = (J f x)*v, but calculated efficiently. And with Jr being the reverse mode operator, we have a numerical equality Jr f x v = (J f x)^T*v, but calculated efficiently.

Note that I'm not constructing any quadratic functions, or any other function whose derivative is f, or anything like that.

> ... representing linear functions> as quadratic ...

From the linear function f:R^n -> R^m we can construct a bilinear form g x y = y^T (f x). Appropriate cascaded application of AD operators to g can yield f or f^*. This is made cleaner if we note that, in f x, the vector x is living in the input space of f; but in g x y, it is more natural to consider x to be living instead in the tangent space of some point in the input space of f; and to consider y to be living not in the output space of f, or even the tangent space of some point in the output space of f, but rather in the co-tangent space of some point in the output space of f. This renders the dot product in the definition of g meaningful.

Anyway, I think we understand each other, we're just using different names. Everything in my paper could be carried out by passing linear filters into a reverse mode automatic differentiator. (So on one level, of course it is differentiation.) But I see myself as extracting out the parts of the algorithm that aren't about explicitly derivatives because in the case that your function is linear you don't need to think about calculus, just multiplying matrices.

I've decided I like your point of view after all and I think I'll mention it if (1) my paper is accepted and (2) I get a chance to revise it. If this happens, I'd like to acknowledge you. What name should I use. (I think I know your full name but I could be wrong.)