Five-Dollar Words for Programmers, Part Two: Orthogonal

In geometry, "orthogonal" basically means the same thing as "perpendicular", or "at right angles". The walls are orthogonal to the floor. But algebraists extend the meaning of "orthogonal" beyond mere perpendicularity; to an algebraist, two aspects of a system are orthogonal if one can be varied without changing the value of the other.

Imagine for instance that you were trying to describe how to get from one point in an empty room to another. A perfectly valid way to do so would be to say how many steps to go north or south, and then how many steps to go northeast or southwest. This hockey-stick navigation system is totally workable, but it feels weird because north and northeast are not orthogonal -- you can't change your position by moving northeast without also at the same time changing how far north you are. With an orthogonal system -- say, the traditional north-south/east-west system -- you can specify how far north to go without worrying about taking the east-west movement into account at all.

Nonorthogonal systems are hard to manipulate because it's hard to tweak isolated parts. Consider my fish tank for example. The pH, hardness, oxidation potential, dissolved oxygen content, salinity and conductivity of the water are very nonorthogonal; changing one tends to have an effect on the others, making it sometimes tricky to get the right balance. Even things like changing the light levels can change the bacteria and algae growth cycles causing chemical changes in the water.

Computer people further extend this algebraic notion of orthogonality to their systems. Consider a case I just came across while studying the C# expression binding code, for example. C# warns about unreachable code:

"Reachable" and "targetted" are orthogonal in one sense. You can have a reachable targetted label, a reachable untargetted label, an unreachable targetted label and an unreachable untargetted label. But they are nonorthogonal in another sense, as I discovered this week when I almost seriously broke the code generator. The way our code generator works, we must generate IL for an unreachable label if it is targetted by a reachable goto. The notions of reachable and targetted get conflated in this one corner case, meaning that I cannot have a code generator that skips generating all unreachable code. This unexpected nonorthogonality makes it difficult to write a compiler that is both correct and understandable, so I'm trying to figure out ways to either eliminate the nonorthogonality, or capture the problem in a very specific piece of highly localized code that I can comment the heck out of.

Now hold on a minute, I hear you saying. Surely if the goto is reachable then the label will be too! My challenge to you guys is this: how can you have an unreachable label targetted by a reachable goto?

Compile an unreachable, labeled basic block to a single nop. Chuck out unreachable, unlabeled basic blocks. Then, targetting is no longer a factor. That's really easy to understand and the JIT will take care of the nop for you.

Johan

28 Oct 2005 12:56 PM

Eric,

Thanks for this explanation of orthogonal. Very insightful and clear, and just so happens to be extremely helpful to what I am doing.

However, it seems to me that your solution just trades "reachable and targetted aren't orthogonal" for "reachable and labeled aren't orthogonal".

It's probably still an improvement (because it's presumably much easier to determine whether a block is labeled than whether a label is targetted) but it isn't a perfect solution to non-orthogonality...

To answer my own point: the "perfect solution to non-orthogonality" is obviously just to compile *every* unreachable basic block to a single nop; then "labeled", "targetted" and "reachable" are all treated orthogonally.

Since that would clearly lead to a ridiculous number of nops, I agree that Nicholas's solution is the way to go :)

Indeed, the problem is a goto across a finally block with an unreachable end.

finally{ while(true){ foo(); } }

would also qualify.

Right now the code generator does what Nicholas describes -- it generates a "leave" instruction for the goto and then lets the IL semantics take care of the leave being hijacked by the finally. That means that the leave has to leave _to_ somewhere, so we generate the label's basic block even if it is unreachable. Of course we discard all of the other unreachable code after the label, so its just a no-op.

I am considering changing the code generator as you guys describe, so that it detects this situation and changes the goto into a branch to the beginning of the finally or some other appropriate location. However I'm not sure whether the amount of changes I'd have to make to the code are worth this particular optimization -- I risk making the code more complex just to eliminate what is admittedly a wart in the unreachable-code-trimming algorithm.

I don't quite follow what you're proposing to do. It shouldn't be possible to branch out of a try block. You can leave out of a try block, but it shouldn't be possible to leave into a finally block. So where would you leave to? The only guaranteed safe point I can think of is the first instruction of the original try block itself. But, and this seems extraordinarily shady implementation-wise, if you leave back to the start of your own try block, the finally blocks aren't executed.

Right -- no matter what happens, that try block has got to be left somehow, and its got to go somewhere, so we'll probably be generating a no-op somewhere for it to go, and we're back in the situation that we're already in. My musings are more along the lines of whether there's a cleaner way to represent this situation in the implementation. I'm working on code that removes the unreachable portions of the bound tree earlier and its a conceptual pain in the rear that I can't remove unreachable targetted labels. I think though that I'm going to live with it since as you say, there's not really a good alternative.

That's essentially what we do. Code that is unreachable is stripped totally from the tree before code generation, except for targetted labels which are turned into nops. We then do an additional reachability check on the generated basic blocks and remove unreachable untargetted blocks.