The 10 solutions I benchmarked had similar performance. The actual numbers and winner varied depending on compiler (icc/gcc), compiler options (e.g., -O3, -march=nocona, -fast, -xHost), and machine. Canon's solution performed well in many benchmark runs, but again the performance advantage was slight. I was surprised that in some cases some solutions were slower than the naive solution with branches.

Have you actually tried the most straightforward way to compare your Point structs: bool operator<(const Point& a, const Point& b){ return a.x < b.x || (a.x == b.x && a.y < b.y); }? I can't see how the look-up table is going to beat that because calculating the array indices appears to be at least as expensive as just calculating the answer. - With a quick test this sorts 1 million random pairs some 30% faster than the look-up table and Tom's Compare function.
–
UncleBensOct 23 '09 at 13:07

I agree. I wouldn't use a table for Point. However, it might make sense when comparing two Lines, which requires comparing 4 integers.
–
Marc EaddyOct 23 '09 at 15:46

Branchless (at the language level) code that maps negative to -1, zero to 0 and positive to +1 looks as follows

int c = (n > 0) - (n < 0);

if you need a different mapping you can simply use an explicit map to remap it

const int MAP[] = { 1, 0, 2 };
int c = MAP[(n > 0) - (n < 0) + 1];

or, for the requested mapping, use some numerical trick like

int c = 2 * (n > 0) + (n < 0);

(It is obviously very easy to generate any mapping from this as long as 0 is mapped to 0. And the code is quite readable. If 0 is mapped to something else, it becomes more tricky and less readable.)

As an additinal note: comparing two integers by subtracting one from another at C language level is a flawed technique, since it is generally prone to overflow. The beauty of the above methods is that they can immedately be used for "subtractionless" comparisons, like

Assuming a sane compiler, this will not invoke the comparison hardware of your system, nor is it using a comparison in the language. To verify: if x == y then d and p will clearly be 0 so the final result will be zero. If (x - y) > 0 then ((x - y) + INT_MAX) will set the high bit of the integer otherwise it will be unset. So p will have its lowest bit set if and only if (x - y) > 0. If (x - y) < 0 then its high bit will be set and d will set its second to lowest bit.

A SuperOptimizer exhaustively searches the instruction space for the best possible combination of instructions that will implement a given function. It is suggested that compilers automagically replace the functions above by their superoptimized versions (although not all compilers do this). For example, in the PowerPC Compiler Writer's Guide (powerpc-cwg.pdf), the cmpu function is shown as this in Appendix D pg 204:

That's pretty good isn't it... just four subtracts (and with carry and/or extended versions). Not to mention it is genuinely branchfree at the machine opcode level. There is probably a PC / Intel X86 equivalent sequence that is similarly short since the GNU Superoptimizer runs for X86 as well as PowerPC.

Note that Unsigned Comparison (cmpu) can be turned into Signed Comparison (cmps) on a 32-bit compare by adding 0x80000000 to both Signed inputs before passing it to cmpu.

Why wouldn't you just add al and dl to get the value of 1 or 2 instead of using SALL (shift arithmetic left long)? Add is simpler and faster than variable shift (which may be microcoded on some CPUs).
–
AdisakOct 26 '09 at 7:17

It's a fair option; I hadn't though of it. Only benchmarks will tell.
–
TordekOct 26 '09 at 17:04

This will not work correctly in cases where the computation of diff = x-y overflows. Also, some compilers will convert the != you use to compute absdiff_not_zero to a branch.
–
AdisakOct 26 '09 at 7:22