Any variations from the OCaml design are due to bugs identified in the prototype compiler during the Matita implementation, our identification of code that can be abstracted and made generic, or our use of Matita's much stronger type system to enforce invariants through the use of dependent types.

In particular, where this deliverable reports on the encoding in the Calculus of Constructions of the back-end semantics, D4.2 is the encoding in the Calculus of Constructions of the mutual translations of those languages.

As mentioned in the report for Deliverable D4.2, there are a number of advantages that this process of abstraction brings, from code reuse to allowing us to get a clearer view of the intermediate languages and their structure.

However, the semantics of the intermediate languages allow us to concretely demonstrate this improvement in clarity, by noting that the semantics of the LTL and the semantics of the LIN languages are identical.

\texttt{pair\_reg}& Various different `move' instructions have been merged into a single move instruction in the joint language. A value can either be moved to or from the accumulator in some languages, or moved to and from an arbitrary pseudoregister in others. This type encodes how we should move data around the registers and accumulators. \\

Naturally, as some intermediate languages are graph based, and others linearised, the passing act of passing control on to the `successor' instruction can either be the act of following a graph edge in a control flow graph, or incrementing an index into a list.

We make a distinction between instructions that pass control onto their immediate successors, and those that jump elsewhere in the program, through the use of \texttt{succ}, denoting the immediate successor of the current instruction, in the \texttt{params\_} record described below.

Statements, on the other hand, consist of either a linear instruction, or a \texttt{GOTO} or \texttt{RETURN} statement, both of which can jump to an arbitrary place in the program. The conditional jump instruction COND is `linear', since it

Note also that we have lifted the hypothesised \texttt{lookup} function from \texttt{params} into a dependent sigma type, which combines a label (the entry and exit points of the control flow graph or list) combined with a proof that the label is in the graph structure:

Therefore, to understand whether the exiting \texttt{main} function is really exiting, or just recursively calling itself, we need to remember the address to which \texttt{main} will return control once the initial call to \texttt{main} has finished executing.

Further, \texttt{set\_result} saves the result of the function computation, and \texttt{exec\_extended} is a function that executes the extended statements, peculiar to each individual intermediate language.

This has the advantage that we can successfully give a semantics to a \texttt{memcpy} routine in the back-end of the CerCo compiler (remembering that \texttt{memcpy} is nothing more than `read a byte, copy a byte' repeated in a loop), an advantage over CompCert. However, the front-end of CerCo cannot because its memory model and values are the similar to CompCert 1.6.

More recent versions of CompCert's memory model have evolved in a similar direction, with a byte-by-byte representation of memory blocks. However, there remains an important difference in the handling of pointer values in the rest of the formalisation. In particular, in CompCert 1.10 only complete pointer values can be loaded in all of the languages in the compiler, whereas in CerCo we need to represent individual bytes of a pointer in the back-end to support our 8-bit target architecture.

It is an open question whether we will unify the two memory models, using only the back-end, bespoke memory model throughout the compiler, as the CompCert memory model seems to work fine for the front-end, where such byte-by-byte copying is not needed.

Should this decision be taken, this will likely be straightforward but potentially time consuming\footnote{After the original version of this deliverable was written we ported the front-end languages' semantics to the back-end memory model. This turned out not to be time consuming, and moreover used definitions linking front-end and back-end values that are required for the correctness proofs anyway. However, the front-end still cannot give a semantics to \texttt{memcpy}, for the same reason as CompCert 1.10; the language currently has no representation for a single byte of a pointer.}.