Ultimately our plan is to expand the capability of the new pipeline so that it does native code generation too, and we can ultimately discard the existing code generators. The design of this stage is here: Commentary/Compiler/IntegratedCodeGen

Overflow parameters are passed on the stack using explicit memory stores, to locations described abstractly using the ''Stack Area'' abstraction..

Making the calling convention explicit includes an explicit store instruction of the return address, which is stored explicitly on the stack in the same way as overflow parameters. This is done (obscurely) in MkGraph.mkCall.

Simple control flow optimisation, implemented in CmmContFlowOpt, called from HscMain.tryNewCodeGen (weirdly). It's called both at the beginning and end of the pipeline.

Branch chain elimination.

Remove unreachable blocks.

Block concatenation. branch to K; and this is the only use of K.

AT THIS POINT CONTROL MOVES TO CmmCps.cpsTop for the rest of the pipeline

More control flow optimisations in CmmCps.cpsTop.

Common Block Elimination (like CSE). This essentially implements the Adams optimisation, we believe.

Consider (sometime): block duplication. branch to K; and K is a short block. Branch chain elimination is just a special case of this.

Proc-point analysis and transformation, implemented in CmmProcPointZ. (Adams version is CmmProcPoint.) The transformation part adds a function prologue to the front of each proc-point, following a standard entry convention.

The analysis produces a set of BlockId that should become proc-points

The transformation inserts a function prologue at the start of each proc-point, and a function epilogue just before each branch to a proc-point.

Add spill/reload, implemented in CmmSpillReload, to spill live C-- variables before a call and reload them afterwards. The spill and reload instructions are simply memory stores and loads respectively, using symbolic stack offsets (see stack layout). For example, a spill of variable 'x' would look like Ptr32[SS(x)] = x.

dualLivenessWithInsertion does two things:

Spills at the definition of any variable that is subequently live across a call (uses a backward analysis)

A StackOffset is the byte offset of a stack slot from the old end (high address) of the frame. It doesn't vary as the physical stack pointer moves.

Manifest the stack pointer, implemented in CmmStackLayout. Once the stack layout mapping has been determined, a second pass walks over the graph, making the stack pointer, Sp explicit. Before this pass, there is no Sp at all. After this, Sp is completely manifest.

replacing references to Areas with offsets from Sp.

adding adjustments to Sp.

Split into multiple CmmProcs, implemented in CmmProcPointZ. At this point we build an info-table for each of the CmmProcs, including SRTs. Done on the basis of the live local variables (by now mapped to stack slots) and live CAF statics.

LastCall and LastReturn nodes are replaced by Jumps.

Build info tables, implemented in CmmBuildInfoTables..

Find each safe MidForeignCall node, "lowers" it into the suspend/call/resume sequence (see Note [Foreign calls] in CmmNode.hs.), and build an info table for them.

Convert the CmmInfo for each CmmProc into a [CmmStatic], using the live variable information computed just before "Figure out stack layout".

AT THIS POINT CONTROL MOVES BACK TO HscMain.tryNewCodeGen where a final control-flow optimisation pass takes place.

Branches to continuations and the "Adams optimisation"

A GC block for a heap check after a call should only take one or two instructions.
However the natural code: