Typed Tagless Interpretations and Typed Compilation

We describe a less-known way of implementing typed embedded domain-specific languages in common metalanguages, stressing type preservation, typed compilation, and multiple interpretations. Type preservation statically and patently assures that interpreters never get stuck and hence run more efficiently.

[The Abstract of the paper]
We have built the first family of tagless interpretations for a
higher-order typed object language in a typed metalanguage
(Haskell or ML) that require no dependent types, generalized algebraic
data types, or postprocessing to eliminate tags. The statically
type-preserving interpretations include an evaluator, a compiler
(or staged evaluator), a partial evaluator, and call-by-name and
call-by-value CPS transformers.

Our principal technique is to encode De Bruijn or higher-order
abstract syntax using combinator functions rather than data constructors.
In other words, we represent object terms not in an initial algebra
but using the coalgebraic structure of the lambda-calculus.
Our representation also simulates inductive maps from types to
types, which are required for typed partial evaluation and CPS
transformations. Our encoding of an object term abstracts
uniformly over the family of ways to interpret it, yet statically
assures that the interpreters never get stuck.
This family of interpreters thus demonstrates again that it is
useful to abstract over higher-kinded types.

README.txt [3K]
Commented code accompanying the JFP paper, with the complete implementations of all interpreters. The code files are in the same directory as the README.txt file.

Chung-chieh Shan: Translations. Blog post, August 17, 2007.
< http://conway.rutgers.edu/~ccshan/wiki/blog/posts/Translations/ >
The article elucidates typed tagless interpretations as a way to relate form with meaning in natural languages. An abstract syntax expression may be interpreted either to yield an utterance or text, or to yield a semantic denotation. Types (in linguistics, categories) ensure the well-formedness of forms, expressions, and denotations. More precisely, typed tagless interpretation turns out to closely related to abstract categorial grammars (ACGs).

Typed Compilation

By `typed compilation' we mean a transformation from an untyped to typed tagless representations. The untyped form is an AST (represented as a regular data type), which is usually the result of parsing a file or similar plain data. The typed tagless representation takes the form of either generalized algebraic data types GADT (the initial approach), or alternatively, type-constructor--polymorphic terms (the final approach). Either type representation can be interpreted in various ways (e.g., evaluated, CPS-transformed, partially evaluated, etc). All these interpretations are assuredly type-preserving and patently free of any `type errors' such as failure to pattern-match type tags or reference to an unbound variable. The freedom from pattern-match failures is syntactically apparent: e.g., the final representation just has no type tags at all, corresponding to closed code by construction. Therefore, the typed representations are handled internally as if no types existed: indeed, after typechecking all the types can be erased.

Typed compilation is a generalization of typechecking (the latter merely verifies if an expression is well-typed). Typed compilation is relied upon when dynamically loading (typed) code or implementing typed DSL. Typed compilation is inherently a partial operation (a source term may be ill-typed) and it has to deal with (run-time) type representations. The result of compilation, however, has no type-related partiality and needs no type information. `Typecheck once, assuredly interpret many times' is our main goal. It is achievable, albeit not simply. We follow the seminal work of Baars and Swierstra on Typing Dynamic Typing.

Conceptually, the typed compiler is a function of the signature typecheck :: Expr -> Term t , whose three concrete examples are given below. One must distinguish this function from a similar, also partial function my_read :: Expr -> Term t . Although the latter has the same signature, the intent is different. When the user writes my_read exp , the user is supposed to specify the type of the desired result. Just as when we write read 1 in Haskell, we are supposed to specify the type of the expected value: Int, Integer, etc. The function typecheck has a different intent: it computes the result type as well as derives the term of that type, the compilation result. The function indeed acts as a compiler of a typed language. The computed result type Term t is a function of the structure of the untyped source expression -- that is, on the face of it, a dependent type. Our solutions are expressed in the language (Haskell) that is not normally thought of as a dependently typed language.

Parameterizing expressions by the evaluation order

We demonstrate a tagless final interpreter for call-by-name, call-by-value and call-by-need simply-typed lambda-calculus with integers and constants. We write a lambda-term once, which GHC(i) immediately type checks. Once the term is accepted, it can be evaluated several times using different interpreters with different evaluation orders. All the interpreters are written in the tagless final framework and so are efficient and assuredly type-preserving. We obtain a higher-order embedded domain-specific language with the selectable evaluation order.

The interpreters implementing different evaluation orders are very much alike. In fact, they are written so to share most of the code save for the interpretation of lam . The semantics of abstraction is indeed what sets the three evaluation orders apart.

We define the DSL as a type class EDSL . The type class declaration defines the syntax and its instances define the semantics of the language. Our DSL is typed; the DSL types are built using the constant IntT and the binary infix type constructor :-> .

data IntT
data a :-> b
infixr 5 :->

We could have used Haskell's Int and arrow types as DSL types. For clarity, we chose to distinguish the object language types from the meta-language (Haskell) types.

The interpretation is parameterized by the type m , which must be a Monad. The use of type families is not essential, merely convenient. In fact, we can easily re-write the whole code in Haskell98, see below. We interpret EDSL expressions of the type a as Haskell values of the type S l m a:

newtype S l m a = S { unS :: m (Sem m a) }

where l is the label for the evaluation order, one of Name , Value , or Lazy .

Here is the call-by-name interpreter (with sub elided). One of the reasons to parameterize the interpreter over MonadIO is to print out the evaluation trace, so that we can see the difference among the three evaluation strategies in the number of performed additions and subtractions.

obtaining the result 40 and observing from the trace that subtraction was not performed (because the value of int 100 `sub` int 10 was not needed to compute the result of t2). On the other hand, the sub-expression int 5 `add` int 5 was evaluated four times.

The call-by-value evaluator differs from the call-by-name one only in the interpretation of the abstraction:

lam f = S . return $ (\x -> x >>= unS . f . S . return)

The evaluation of the lambda abstraction body always starts by evaluating the argument, whether the result will be needed or not. That is literally the definition of call-by-value. The very same sample term can be interpreted differently:

giving in the end the same result 40. Although the result of the subtraction was not needed, the trace shows it performed. On the other hand, the argument sub-expression int 5 `add` int 5 was evaluated only once. In call-by-value, arguments of evaluated applications are evaluated exactly once.

The call-by-need evaluator differs from the others again in one line, the interpretation of abstractions:

lam f = S . return $ (\x -> share x >>= unS . f . S)

The evaluation of the body of the abstraction always starts by lazy sharing the argument expression. Again, this is the definition of call-by-need. We run the very same term t2 with the new evaluator, obtaining the same result 40 and observing from the execution trace that subtraction was not evaluated (because it was not needed) but the needed argument expression int 5 `add` int 5 was evaluated once. In call by need, arguments of evaluated applications are evaluated at most once.

Version

The current version is 1.1, Oct 8, 2009.

References

CB.hs [7K]
The complete code and the evaluation tests and traces
The code was originally posted as CBN, CBV, Lazy in the same final tagless framework on the Haskell-Cafe mailing list on Thu, 8 Oct 2009 00:54:14 -0700 (PDT).

CB98.hs [7K]
The same code in Haskell98, proving that the tagless final approach indeed requires fewer fancy type system features. The fancy features like the type families, used in CB.hs, add convenience and gloss. One can do without them however. The code can easily be re-written in OCaml, which has no typeclasses, type families and other bleeding-edge features.

Typed compilation to HOAS as emulation of staging

We present the typed compilation from an untyped higher-order language to the typed tagless final representation. The latter can be evaluated by all the interpreters in the Tagless Final paper. The typechecking happens only once, regardless of the number of interpretations of the term. The typed compiler has the signature

Given the untyped term and the type environment, typecheck returns either a type error message, or the compiled term and the representation of its computed type. The compilation result can be interpreted by any Symantics interpreter assuredly without any pattern-match errors.

fails in our goal of compiling only once: here, typecheck is invoked every time the compiled function is applied. Since the typechecking is partial, we can no longer assure that the result of a successful compilation can be interpreted without any type-related errors.

Nevertheless, we have accomplished our task, without relying on any built-in staging, GADTs or type classes with functional dependencies. Of all the extensions to Haskell98, we only use existentials and Typeable. The code shows uncanny similarities with staging. Furthermore, it seems we need staging or its emulation for the typed compilation into higher-order abstract syntax. Both type-checking and staging have to manipulate open code, whose free variables are hypotheses, or type abstractions.

Another remarkable part of the tagless final compilation is its close relationship with proofs in logic. Typechecking is truly building a proof, using hypothetical reasoning. Moreover, our typed compiler must be explicit with weakening (to extend the typing environment) and the application of structural rules (which take the form of cut-rules). The result of compiling the body of the function happens do be the same as that of compiling the whole function, which is the statement of the Deduction Theorem.

Finally, our DynTerms are essentially Data.Dynamic , only with more operations and implemented in pure Haskell.

IncopeTypecheck.hs [18K]
Extensively commented Haskell code. The title comments explain the problem of HOAS typechecking, the solution, and its connection to staging. The code requires Incope.hs, see below.

Incope.hs [21K]
Tagless final interpreters. The archive includes Incope.hs, which defines our source language and several its interpreters.

Closing the Stage: from staged code to typed closures
The translation of the staged code to an unstaged language, too, requires manipulation of `open' code and the explicit weakening of the binding environment. The weakening is called coercion in the paper `Closing the Stage'.

Tagless (staged) interpreter typeclass for typed languages

We demonstrate a tagless (definitional) interpreter for a typed language implemented in a typed meta-language: Haskell with multi-parameter typeclasses and functional dependencies. The interpreter uses no universal type, no type tags, no pattern-matching. It is, in fact, total -- syntactically . The interpreter supports heterogeneous binding environment and the (functional) dependence of the type of the result on the structure of the source term. The interpreter is in fact a type class:

Our code has been greatly inspired by the ICFP 2002 paper by Pasalic, Taha, and Sheard on staged tagless interpreters. The paper gives the most lucid explanation of the tagging problem in typed interpretation. Although the paper develops a dependently typed language Meta-D for writing typed tagless interpreters, the paper itself gives hints that dependent types are not really necessary. The key phrase was about indexing types by singleton types rather than by terms. The former is easily implementable in Haskell as it is. The introduction section gave the other hint: the apparent problem with the eval function is that it should yield an Int when evaluating the literal constant expression B 1 and yield a function when evaluating the term L "x" (Var "x") . Indeed no ordinary function can return values of different types. But an overloaded function can, e.g., Haskell's read .

With the help of Template Haskell, we stage our tagless code to remove its interpretative overhead. Because expressions in Template Haskell are untyped, we add a newtype wrapper to maintain their types. Our staged interpreter deals exclusively with these typed code expressions, to be faithful to the Pasalic et al. paper. Template Haskell can print code values, so we can see the staged result: the `compiled' code. In particular, here is the running example of the paper and the result of its evaluation with our staged interpreter:

If we change TInt above to (TArr TInt TInt) , we get a type error before running stest3: The typing is done at the meta-level.

The present code was the first attempt to define tagless interpreters in a language without (overt) dependent types. This work has continued in cooperation with Jacques Carette and Chung-chieh Shan. We showed that writing typed interpreters becomes significantly simpler if we change the building blocks of object language terms, from data constructors to constructor functions.

Interp.hs [3K]
The tagless typed interpreter for the the typed language of the above paper, viz. simply-typed lambda-calculus with De Bruijn indices. The interpreter is deliberately patterned after the one in the paper, including the type-level function TypEval. The code almost literally implements the Meta-D interpreter from Fig. 3 of the paper and the typing rules from Fig. 1 -- without any dependent types.

We demonstrate a typed compiler, the function whose type is literally Expr -> Term t . Here Expr is an ordinary algebraic data type of untyped source terms (the first-order language described in Peyton-Jones et al, ICFP 2006), and Term t is a GADT, the typed representation.

The result typet is a function of the value of Expr . Thus we demonstrate the Haskell solution of the truly dependent-type problem.

Although the result of the typed compilation is a GADT, the compiler itself uses neither GADTs nor representation types. The compiler is made of two parts. The first is the conversion from Expr to `lifted' terms:

The only inconvenience of using the Template Haskell is the necessity of splitting the whole code into two modules.

The main part is the typechecker class, which computes both the type of the result t and the corresponding value of the type Term t , the compilation result. The typechecking rules are stated as instances of the type class:

Obviously, the term e1 is ill-typed: the test expression of a conditional should be a boolean rather than an integer. Therefore, the expression typecheck t1 gives a type error: ``Couldn't match expected type Bool against inferred type Int''. In contrast, the typechecking (of the result of the parsing) of e2 succeeds. The computed type of the compilation result is Term Int .

We stress that the typechecking of the embedded DSL is carried out by the Haskell typechecker! It is the latter that applies the typing judgments of our DSL expressed as the instances of the class TypeCheck . We thus succeeded in embedding into Haskell not only DSL terms but also the DSL type system.

Version

The current version is 1.1, Nov 2006.

References

TypeCheck.hs [3K]
TermLift.hs [2K]
The complete implementation. The code was originally posted as An example of dependent types [was: Simple GADT parser for the eval example] on the Haskell-Cafe mailing list on Wed, 1 Nov 2006 00:30:56 -0800 (PST).

Typed compilation via GADTs

We describe the typed compiler into a GADT-based typed representation. The compiler itself is implemented via GADT. It has the signature

The compiler takes the type environment and the untyped term, the value of the ordinary algebraic data type Exp, and returns either the type error message, or the compiled term and the representation of its computed type. Both Typ t and Term t are GADTs. Although to Haskell TypedTerm is just as `untyped' as Exp , the latter is `deeply' untyped whereas the former is only superficially. The former value is built out of typed components, and the type is erased only at the end. That fact guarantees that an evaluator of Term t , e.g., eval :: Term t -> t never gets stuck. Combining the evaluator with the typed compiler and the show function gives the complete DSL interpreter teval :: Exp -> String

Upon success, the compiler returns a typed term wrapped in an existential envelope. Although we can evaluate that term, the result is not truly satisfactory because the existential type is not `real'. We cannot write

case (typecheck te3) of Right (TypedTerm t e) -> sin (eval e)

although we know that e has the type Term Double and so applying the function sin to the result of eval e is well-typed. To the Haskell typechecker however, the type of eval e is some abstract type t rather than Double . Fortunately, Template Haskell lets us convert from an existential to a concrete type. This is equivalent to implementing an embedded compiler for our DSL. Since TypedTerm has already been typechecked, we are guaranteed the absence of errors during Template-Haskell-based `code generation'. The compiler, of the signature tevall :: Exp -> ExpQ , can be used as follows

The Template Haskell compiler also relies of the function lift'self :: Term a -> ExpQ satisfying the equation $(lift'self term) == term . It takes only four lines of code to define this function.

Version

The current version is 1.2, October 2007.

References

TypecheckedDSL.hs [5K]
The complete implementation of a typed DSL with the typed evaluator and the typed compiler from untyped terms to GADT. The code was originally posted as Typechecker to GADT: the full implementation of a typed DSL on the Haskell-Cafe mailing list on Thu Oct 4 02:02:32 EDT 2007.

TypecheckedDSLTH.hs [7K]
TypedTermLiftTest.hs [<1K]
The complete code for the compiler of the typed DSL, using Template Haskell to `compile' GADT to `machine code'. The code was originally posted as Typed DSL compiler, or converting from an existential to a concrete type on the Haskell-Cafe mailing list on Sat Oct 6 03:55:36 EDT 2007.

Relating Final and Initial typed tagless representations

We have seen two approaches to typed tagless representation of an embedded DSL. Either representation can be interpreted with no errors due to type-tag mismatch, or due to a reference to an unbound variable. The absence of both sorts of errors is statically assured and patent to the metalanguage compiler.

In the initial approach, typed terms are represented by GADTs. The absence of type-tag mismatch errors is the central property of GADT. The absence of unbound variable reference errors is assured either by higher-order abstract syntax (Xi et al., POPL 2003) or De Bruijn indices and dependent types (Pasalic et al., ICFP 2002). This page has described the final tagless approach. Type-tag mismatch errors are patently absent because there are simply no type tags and hence no possibility of type errors during interpretation. The absence of the second sort of errors can likewise be assured by higher-order abstract syntax (used here) or De Bruijn indices.

We demonstrate that the final and initial typed tagless representations are related by bijection. We use the higher-order language of the Tagless Final paper (APLAS 2007), which is the superset of the language introduced in Xi et al (POPL 2003). In the latter paper, the tagless interpretation of the language was the motivation for GADT. In a bit abbreviated form, the final and the initial representations of our DSL are defined as follows:

The data constructor Var of the initial representation corresponds to HOASLift of Xi et al. The initial representation is parameterized by the type of the hypothetical environment h: h t is the type of an environment `cell' holding a value of the type t .

The relation between the two representations is established as follows:

We note the properties of the mappings from the final to the initial and vice versa: both mappings are total and a composition of one mapping with the other preserves interpretations. The code below gives concrete examples of that preservation. The totality is especially easy to see for the mapping from the final to the initial, since the mapping looks like identity. The mapping is one of many possible interpretations of a term in the final tagless form.

Version

The current version is 1.1, Jan 1, 2008.

References

InFin.hs [8K]
Haskell code with the complete definitions of both representations, several sample interpreters, complete bijections and their compositions. The code includes several concrete examples.