To what extent can you trust your compiler in preserving those properties?

102

103

\end{itemize*}

103

These questions, and others like them, motivate a current `hot topic' in computer science research: \emph{compiler verification} (for instance~\cite{leroy:formal:2009,chlipala:verified:2010}, and many others).

104

These questions, and others like them, motivate a current `hot topic' in computer science research: \emph{compiler verification} (for instance~\cite{leroy:formal:2009,chlipala:verified:2010}, and many others).

104

105

So far, the field has been focused on the first and last questions only.

105

106

In particular, much attention has been placed on verifying compiler correctness with respect to extensional properties of programs, which are easily preserved during compilation; it is sufficient to completely preserve the denotational semantics of the input program.

These are still widely used in embedded systems, and have the advantage of an easily predictable cost model due to the relative sparcity of features that they possess.

137

138

138

In particular, we have fully formalised an executable formal semantics of a family of 8 bit Freescale Microprocessors~\cite{oliboni}, and provided a similar executable formal semantics for the MCS-51 microprocessor.

139

In particular, we have fully formalised an executable formal semantics of a family of 8 bit Freescale Microprocessors~\cite{oliboni:matita:2008}, and provided a similar executable formal semantics for the MCS-51 microprocessor.

139

140

The latter work is what we describe in this paper.

140

141

The main focus of the formalisation has been on capturing the intensional behaviour of the processor.

The formalization of MCS-51 must deal with bytes (8 bits), words (16 bits) but also with more exoteric quantities (7 bits, 3 bits, 9 bits). To avoid size mismatch bugs difficult to spot, we represent all of these quantities using bitvectors, i.e. fixed length vectors of booleans.

282

The formalization of MCS-51 must deal with bytes (8 bits), words (16 bits) but also with more exoteric quantities (7 bits, 3 bits, 9 bits).

283

To avoid size mismatch bugs difficult to spot, we represent all of these quantities using bitvectors, i.e. fixed length vectors of booleans.

282

284

In our O'Caml emulator, we `faked' bitvectors using phantom types implemented with polymorphic variants\cite{phantom_types_ocaml}, as in Figure~\ref{fig.ocaml.implementation.bitvectors}.

283

From within the bitvector module (left column) bitvectors are just lists of bits and no guarantee is provided on sizes. However, the module's interface (right column) enforces the size invariants in the rest of the code.

285

From within the bitvector module (left column) bitvectors are just lists of bits and no guarantee is provided on sizes.

286

However, the module's interface (right column) enforces the size invariants in the rest of the code.

284

287

285

288

In Matita, we are able to use the full power of dependent types to always work with vectors of a known size:

Executability is another key difference between our work and~\cite{fox:trustworthy:2010}.

685

Our formalisation is executable: applying the emulation function to an input state eventually reduces to an output state that already satisfies the appropriate conditions.

686

This is because Matita is based on a logic that internalizes conversion.

682

687

In~\cite{fox:trustworthy:2010} the authors provide an automation layer to derive single step theorems: if the processor is in a particular state that satisfies some preconditions, then after execution of an instruction it will reside in another state satisfying some postconditions.

683

688

We do not need single step theorems of this form.

684

This is because Matita is based on a logic that internalizes conversion.

685

As a result, our formalisation is executable: applying the emulation function to an input state eventually reduces to an output state that already satisfies the appropriate conditions.

686

689

687

690

Our main difficulties resided in the non-uniformity of an old 8-bit architecture, in terms of the instruction set, addressing modes and memory models.