This should be slower since I'm no longer inlining, but I have modified scheduling to chain fast operations “freely” with no latency.
The only fast operations now are bitshift by a constant and casting (zext, sext). This is using LLVM 2.7: